zorba-coders team mailing list archive
-
zorba-coders team
-
Mailing list archive
-
Message #11628
[Merge] lp:~paul-lucas/zorba/feature-ft_bw into lp:zorba
Paul J. Lucas has proposed merging lp:~paul-lucas/zorba/feature-ft_bw into lp:zorba.
Requested reviews:
Paul J. Lucas (paul-lucas)
Related bugs:
Bug #1014999 in Zorba: "Implement full-text black/white feature"
https://bugs.launchpad.net/zorba/+bug/1014999
For more details, see:
https://code.launchpad.net/~paul-lucas/zorba/feature-ft_bw/+merge/112811
Added tokenize-nodes() function.
--
https://code.launchpad.net/~paul-lucas/zorba/feature-ft_bw/+merge/112811
Your team Zorba Coders is subscribed to branch lp:zorba.
=== modified file 'ChangeLog'
--- ChangeLog 2012-06-29 13:25:20 +0000
+++ ChangeLog 2012-06-29 16:44:26 +0000
@@ -4,8 +4,10 @@
version 2.x
New Features:
+
* Item::isSeekable API extension for streamable content (xs:string and xs:base64Binary).
* Implemented the latest W3C specification for the group by clause
+ * Added ft:tokenize-nodes() function to full-text module
* New XQuery 3.0 functions
- fn:parse-xml-fragment#1
* Added support for transient maps to the http://www.zorba-xquery.com/modules/store/data-structures/unordered-map module.
=== modified file 'include/zorba/tokenizer.h'
--- include/zorba/tokenizer.h 2012-06-28 04:14:03 +0000
+++ include/zorba/tokenizer.h 2012-06-29 16:44:26 +0000
@@ -79,7 +79,7 @@
/**
* This member-function is called whenever an item that is being tokenized
- * is entered or exited.
+ * is entered or exited. The default implementation does nothing.
*
* @param item The item being entered or exited.
* @param entering If \c true, the item is being entered; if \c false, the
=== modified file 'modules/com/zorba-xquery/www/modules/full-text.xq'
--- modules/com/zorba-xquery/www/modules/full-text.xq 2012-06-28 04:14:03 +0000
+++ modules/com/zorba-xquery/www/modules/full-text.xq 2012-06-29 16:44:26 +0000
@@ -767,14 +767,14 @@
as xs:string* external;
(:~
- : Tokenizes the given node and all of its descendants.
+ : Tokenizes the given node and all of its decendants.
:
: @param $node The node to tokenize.
: @param $lang The default
: <a href="http://www.w3.org/TR/xmlschema-2/#language";>language</a>
: of <code>$node</code>.
: @return a (possibly empty) sequence of tokens.
- : @error err:FTST0009 if <code>$lang</code> is not supported in general.
+ : @error err:FTST0009 if <code>$lang</code> is not supported.
: @example test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-node-1.xq
:)
declare function ft:tokenize-node( $node as node(), $lang as xs:language )
@@ -784,12 +784,11 @@
: Tokenizes the given node and all of its descendants.
:
: @param $node The node to tokenize.
- : The document's default
+ : The node's default
: <a href="http://www.w3.org/TR/xmlschema-2/#language";>language</a>
: is assumed to be the one returned by <code>ft:current-lang()</code>.
: @return a (possibly empty) sequence of tokens.
- : @error err:FTST0009 if <code>ft:current-lang()</code> is not supported in
- : general.
+ : @error err:FTST0009 if <code>ft:current-lang()</code> is not supported.
: @example test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-node-2.xq
: @example test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-node-3.xq
: @example test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-node-4.xq
@@ -798,10 +797,47 @@
as element(ft-schema:token)* external;
(:~
+ : Tokenizes the set of nodes comprising <code>$includes</code> (and all of its
+ : descendants) but excluding <code>$excludes</code> (and all of its
+ : descendants), if any.
+ :
+ : @param $includes The set of nodes (and its descendants) to include.
+ : The default
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language";>language</a>
+ : is assumed to be the one returned by <code>ft:current-lang()</code>.
+ : @param $excludes The set of nodes (and its descendants) to exclude.
+ : @return a (possibly empty) sequence of tokens.
+ : @error err:FTST0009 if <code>ft:current-lang()</code> is not supported.
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-nodes-1.xq
+ :)
+declare function ft:tokenize-nodes( $includes as node()+,
+ $excludes as node()* )
+ as element(ft-schema:token)* external;
+
+(:~
+ : Tokenizes the set of nodes comprising <code>$includes</code> (and all of its
+ : descendants) but excluding <code>$excludes</code> (and all of its
+ : descendants), if any.
+ :
+ : @param $includes The set of nodes (and its descendants) to include.
+ : @param $excludes The set of nodes (and its descendants) to exclude.
+ : @param $lang The default
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language";>language</a>
+ : for nodes.
+ : @return a (possibly empty) sequence of tokens.
+ : @error err:FTST0009 if <code>$lang</code> is not supported.
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-nodes-1.xq
+ :)
+declare function ft:tokenize-nodes( $includes as node()+,
+ $excludes as node()*,
+ $lang as xs:language )
+ as element(ft-schema:token)* external;
+
+(:~
: Tokenizes the given string.
:
: @param $string The string to tokenize.
- : @param $lang The default
+ : @param $lang The
: <a href="http://www.w3.org/TR/xmlschema-2/#language";>language</a>
: of <code>$string</code>.
: @return a (possibly empty) sequence of tokens.
@@ -816,7 +852,7 @@
: Tokenizes the given string.
:
: @param $string The string to tokenize.
- : The string's default
+ : The string's
: <a href="http://www.w3.org/TR/xmlschema-2/#language";>language</a>
: is assumed to be the one returned by <code>ft:current-lang()</code>.
: @return a (possibly empty) sequence of tokens.
=== modified file 'src/functions/func_ft_module_impl.cpp'
--- src/functions/func_ft_module_impl.cpp 2012-06-28 04:14:03 +0000
+++ src/functions/func_ft_module_impl.cpp 2012-06-29 16:44:26 +0000
@@ -36,6 +36,17 @@
}
+PlanIter_t full_text_tokenize_nodes::codegen(
+ CompilerCB*,
+ static_context* sctx,
+ const QueryLoc& loc,
+ std::vector<PlanIter_t>& argv,
+ expr& ann) const
+{
+ return new TokenizeNodesIterator(sctx, loc, argv);
+}
+
+
PlanIter_t full_text_tokenizer_properties::codegen(
CompilerCB*,
static_context* sctx,
@@ -59,7 +70,6 @@
#endif // ZORBA_NO_FULL_TEXT
-
///////////////////////////////////////////////////////////////////////////////
void populate_context_ft_module_impl(static_context* sctx)
@@ -105,6 +115,25 @@
tokenize_return_type),
FunctionConsts::FULL_TEXT_TOKENIZE_NODE_2);
}
+ {
+ DECL_WITH_KIND(sctx,
+ full_text_tokenize_nodes,
+ (createQName( FT_MODULE_NS, "", "tokenize-nodes"),
+ GENV_TYPESYSTEM.ANY_NODE_TYPE_PLUS,
+ GENV_TYPESYSTEM.ANY_NODE_TYPE_STAR,
+ tokenize_return_type),
+ FunctionConsts::FULL_TEXT_TOKENIZE_NODES_2);
+ }
+ {
+ DECL_WITH_KIND(sctx,
+ full_text_tokenize_nodes,
+ (createQName( FT_MODULE_NS, "", "tokenize-nodes"),
+ GENV_TYPESYSTEM.ANY_NODE_TYPE_PLUS,
+ GENV_TYPESYSTEM.ANY_NODE_TYPE_STAR,
+ GENV_TYPESYSTEM.LANGUAGE_TYPE_ONE,
+ tokenize_return_type),
+ FunctionConsts::FULL_TEXT_TOKENIZE_NODES_3);
+ }
xqtref_t tokenizer_properties_return_type =
GENV_TYPESYSTEM.create_node_type(store::StoreConsts::elementNode,
@@ -128,10 +157,10 @@
tokenizer_properties_return_type),
FunctionConsts::FULL_TEXT_TOKENIZER_PROPERTIES_1);
}
-#endif // ZORBA_NO_FULL_TEXT
+#endif /* ZORBA_NO_FULL_TEXT */
}
-
+///////////////////////////////////////////////////////////////////////////////
} // namespace zorba
/* vim:set et sw=2 ts=2: */
=== modified file 'src/functions/func_ft_module_impl.h'
--- src/functions/func_ft_module_impl.h 2012-06-28 04:14:03 +0000
+++ src/functions/func_ft_module_impl.h 2012-06-29 16:44:26 +0000
@@ -49,6 +49,26 @@
};
+//full-text:tokenize_nodes
+class full_text_tokenize_nodes : public function
+{
+public:
+ full_text_tokenize_nodes(const signature& sig,
+ FunctionConsts::FunctionKind kind) :
+ function(sig, kind)
+ {
+
+ }
+
+ // Mark the function as accessing the dyn ctx so that it won't be
+ // const-folded. We must prevent const-folding because the function
+ // uses the store to get access to the tokenizer provider.
+ bool accessesDynCtx() const { return true; }
+
+ CODEGEN_DECL();
+};
+
+
//full-text:tokenizer-properties
class full_text_tokenizer_properties : public function
{
=== modified file 'src/functions/function_consts.h'
--- src/functions/function_consts.h 2012-06-28 04:14:03 +0000
+++ src/functions/function_consts.h 2012-06-29 16:44:26 +0000
@@ -238,7 +238,9 @@
FULL_TEXT_TOKENIZER_PROPERTIES_0,
FULL_TEXT_TOKENIZE_NODE_2,
FULL_TEXT_TOKENIZE_NODE_1,
-#endif
+ FULL_TEXT_TOKENIZE_NODES_3,
+ FULL_TEXT_TOKENIZE_NODES_2,
+#endif /* ZORBA_NO_FULL_TEXT */
#include "functions/function_enum.h"
=== modified file 'src/runtime/full_text/CMakeLists.txt'
--- src/runtime/full_text/CMakeLists.txt 2012-06-28 04:14:03 +0000
+++ src/runtime/full_text/CMakeLists.txt 2012-06-29 16:44:26 +0000
@@ -41,6 +41,7 @@
thesaurus.cpp
tokenizer.cpp
default_tokenizer.cpp
+ ft_module_util.cpp
ft_module.cpp
)
=== modified file 'src/runtime/full_text/apply.h'
--- src/runtime/full_text/apply.h 2012-06-28 04:14:03 +0000
+++ src/runtime/full_text/apply.h 2012-06-29 16:44:26 +0000
@@ -24,6 +24,8 @@
namespace zorba {
+///////////////////////////////////////////////////////////////////////////////
+
void apply_ftand( ft_all_matches const&, ft_all_matches const&,
ft_all_matches &result );
@@ -52,6 +54,8 @@
void apply_ftwindow( ft_all_matches const&, ft_int window_size, ft_unit::type,
ft_all_matches &result );
+///////////////////////////////////////////////////////////////////////////////
+
} // namespace zorba
#endif /* ZORBA_FULL_TEXT_APPLY_H */
/* vim:set et sw=2 ts=2: */
=== modified file 'src/runtime/full_text/ft_module_impl.cpp'
--- src/runtime/full_text/ft_module_impl.cpp 2012-06-28 04:14:03 +0000
+++ src/runtime/full_text/ft_module_impl.cpp 2012-06-29 16:44:26 +0000
@@ -13,7 +13,7 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
-#include "stdafx.h"
+
#include <zorba/config.h>
//
@@ -23,6 +23,8 @@
//
#ifndef ZORBA_NO_FULL_TEXT
+#include "stdafx.h"
+
#include <limits>
#include <typeinfo>
@@ -42,10 +44,12 @@
#include "types/casting.h"
#include "types/typeimpl.h"
#include "types/typeops.h"
+#include "util/stl_util.h"
#include "util/utf8_util.h"
#include "zorbatypes/URI.h"
#include "zorbautils/locale.h"
+#include "ft_module_util.h"
#include "ft_stop_words_set.h"
#include "ft_token_seq_iterator.h"
#include "ft_util.h"
@@ -87,6 +91,85 @@
);
}
+static Tokenizer::ptr get_tokenizer( iso639_1::type lang,
+ Tokenizer::State *t_state,
+ QueryLoc const &loc ) {
+ TokenizerProvider const *const provider = GENV_STORE.getTokenizerProvider();
+ ZORBA_ASSERT( provider );
+ Tokenizer::ptr tokenizer;
+ if ( !provider->getTokenizer( lang, t_state, &tokenizer ) )
+ throw XQUERY_EXCEPTION(
+ err::FTST0009 /* lang not supported */,
+ ERROR_PARAMS(
+ iso639_1::string_of[ lang ], ZED( FTST0009_BadTokenizerLang )
+ ),
+ ERROR_LOC( loc )
+ );
+ return std::move( tokenizer );
+}
+
+static void make_token_element( FTToken const &token,
+ TokenQNames const &qnames,
+ store::Item_t &result ) {
+ zstring base_uri = static_context::ZORBA_FULL_TEXT_FN_NS;
+ store::Item_t item, attr_node, node_name, type_name;
+ store::NsBindings const ns_bindings;
+ zstring value_string;
+
+ type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
+ node_name = qnames.token;
+ GENV_ITEMFACTORY->createElementNode(
+ result, nullptr, node_name, type_name, false, false,
+ ns_bindings, base_uri
+ );
+
+ if ( token.lang() ) {
+ value_string = iso639_1::string_of[ token.lang() ];
+ GENV_ITEMFACTORY->createString( item, value_string );
+ type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
+ node_name = qnames.lang;
+ GENV_ITEMFACTORY->createAttributeNode(
+ attr_node, result, node_name, type_name, item
+ );
+ }
+
+ ztd::to_string( token.para(), &value_string );
+ GENV_ITEMFACTORY->createString( item, value_string );
+ type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
+ node_name = qnames.paragraph;
+ GENV_ITEMFACTORY->createAttributeNode(
+ attr_node, result, node_name, type_name, item
+ );
+
+ ztd::to_string( token.sent(), &value_string );
+ GENV_ITEMFACTORY->createString( item, value_string );
+ type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
+ node_name = qnames.sentence;
+ GENV_ITEMFACTORY->createAttributeNode(
+ attr_node, result, node_name, type_name, item
+ );
+
+ value_string = token.value();
+ GENV_ITEMFACTORY->createString( item, value_string );
+ type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
+ node_name = qnames.value;
+ GENV_ITEMFACTORY->createAttributeNode(
+ attr_node, result, node_name, type_name, item
+ );
+
+ if ( store::Item const *const token_item = token.item() ) {
+ if ( GENV_STORE.getNodeReference( item, token_item ) ) {
+ item->getStringValue2( value_string );
+ GENV_ITEMFACTORY->createString( item, value_string );
+ type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
+ node_name = qnames.node_ref;
+ GENV_ITEMFACTORY->createAttributeNode(
+ attr_node, result, node_name, type_name, item
+ );
+ }
+ }
+}
+
///////////////////////////////////////////////////////////////////////////////
bool CurrentCompareOptionsIterator::nextImpl( store::Item_t &result,
@@ -296,10 +379,9 @@
}
try {
- static_context const *const sctx = getStaticContext();
- ZORBA_ASSERT( sctx );
iso639_1::type const lang = get_lang_from( item, loc );
-
+ static_context const *const sctx = getStaticContext();
+ ZORBA_ASSERT( sctx );
zstring error_msg;
auto_ptr<internal::Resource> rsrc = sctx->resolve_uri(
uri, internal::EntityData::THESAURUS, error_msg
@@ -369,7 +451,6 @@
PlanIteratorState *state;
DEFAULT_STACK_INIT( PlanIteratorState, state, plan_state );
-
consumeNext( item, theChildren[0], plan_state );
item->getStringValue2( word );
utf8::to_lower( word );
@@ -535,45 +616,12 @@
///////////////////////////////////////////////////////////////////////////////
-TokenizeNodeIterator::TokenizeNodeIterator( static_context *sctx,
- QueryLoc const &loc,
- std::vector<PlanIter_t>& children ):
- NaryBaseIterator<TokenizeNodeIterator,TokenizeNodeIteratorState>(sctx, loc, children)
-{
- initMembers();
-}
-
-void TokenizeNodeIterator::initMembers() {
- GENV_ITEMFACTORY->createQName(
- token_qname_, static_context::ZORBA_FULL_TEXT_FN_NS, "", "token" );
-
- GENV_ITEMFACTORY->createQName(
- lang_qname_, "", "", "lang" );
-
- GENV_ITEMFACTORY->createQName(
- para_qname_, "", "", "paragraph" );
-
- GENV_ITEMFACTORY->createQName(
- sent_qname_, "", "", "sentence" );
-
- GENV_ITEMFACTORY->createQName(
- value_qname_, "", "", "value" );
-
- GENV_ITEMFACTORY->createQName(
- ref_qname_, "", "", "node-ref" );
-}
-
bool TokenizeNodeIterator::nextImpl( store::Item_t &result,
PlanState &plan_state ) const {
- store::Item_t node_name, attr_node;
- zstring base_uri;
store::Item_t item;
iso639_1::type lang;
Tokenizer::State t_state;
- store::NsBindings const ns_bindings;
TokenizerProvider const *tokenizer_provider;
- store::Item_t type_name;
- zstring value_string;
TokenizeNodeIteratorState *state;
DEFAULT_STACK_INIT( TokenizeNodeIteratorState, state, plan_state );
@@ -594,66 +642,11 @@
state->doc_item_->getTokens( *tokenizer_provider, t_state, lang );
while ( state->doc_tokens_->hasNext() ) {
- FTToken const *token;
- token = state->doc_tokens_->next();
- ZORBA_ASSERT( token );
-
- base_uri = static_context::ZORBA_FULL_TEXT_FN_NS;
- type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
- node_name = token_qname_;
- GENV_ITEMFACTORY->createElementNode(
- result, nullptr, node_name, type_name, false, false,
- ns_bindings, base_uri
- );
-
- if ( token->lang() ) {
- value_string = iso639_1::string_of[ token->lang() ];
- GENV_ITEMFACTORY->createString( item, value_string );
- type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
- node_name = lang_qname_;
- GENV_ITEMFACTORY->createAttributeNode(
- attr_node, result, node_name, type_name, item
- );
- }
-
- ztd::to_string( token->para(), &value_string );
- GENV_ITEMFACTORY->createString( item, value_string );
- type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
- node_name = para_qname_;
- GENV_ITEMFACTORY->createAttributeNode(
- attr_node, result, node_name, type_name, item
- );
-
- ztd::to_string( token->sent(), &value_string );
- GENV_ITEMFACTORY->createString( item, value_string );
- type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
- node_name = sent_qname_;
- GENV_ITEMFACTORY->createAttributeNode(
- attr_node, result, node_name, type_name, item
- );
-
- value_string = token->value();
- GENV_ITEMFACTORY->createString( item, value_string );
- type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
- node_name = value_qname_;
- GENV_ITEMFACTORY->createAttributeNode(
- attr_node, result, node_name, type_name, item
- );
-
- if ( store::Item const *const token_item = token->item() ) {
- if ( GENV_STORE.getNodeReference( item, token_item ) ) {
- item->getStringValue2( value_string );
- GENV_ITEMFACTORY->createString( item, value_string );
- type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
- node_name = ref_qname_;
- GENV_ITEMFACTORY->createAttributeNode(
- attr_node, result, node_name, type_name, item
- );
- }
- }
-
+ make_token_element(
+ *state->doc_tokens_->next(), state->token_qnames_, result
+ );
STACK_PUSH( true, state );
- } // while
+ }
}
STACK_END( state );
@@ -669,12 +662,140 @@
state->doc_tokens_->reset();
}
-void TokenizeNodeIterator::serialize( serialization::Archiver &ar ) {
- serialize_baseclass(
- ar, (NaryBaseIterator<TokenizeNodeIterator,TokenizeNodeIteratorState>*)this
- );
- if ( !ar.is_serializing_out() )
- initMembers();
+///////////////////////////////////////////////////////////////////////////////
+
+bool TokenizeNodesIterator::nextImpl( store::Item_t &result,
+ PlanState &plan_state ) const {
+ store::Item_t item;
+ iso639_1::type lang;
+ Tokenizer::State t_state;
+ Tokenizer::ptr tokenizer;
+
+ TokenizeNodesIteratorState *state;
+ DEFAULT_STACK_INIT( TokenizeNodesIteratorState, state, plan_state );
+
+ if ( theChildren.size() > 2 ) {
+ consumeNext( item, theChildren[2], plan_state );
+ lang = get_lang_from( item, loc );
+ } else {
+ static_context const *const sctx = getStaticContext();
+ ZORBA_ASSERT( sctx );
+ lang = get_lang_from( sctx );
+ }
+
+ tokenizer = get_tokenizer( lang, &state->t_state_, loc );
+
+ // $includes
+ while ( consumeNext( item, theChildren[0], plan_state ) )
+ state->includes_.push_back( item );
+ state->includes_.push_back( store::Item_t() ); // sentinel
+
+ // $excludes
+ while ( consumeNext( item, theChildren[1], plan_state ) ) {
+ store::Item_t exc_si;
+ GENV_STORE.getStructuralInformation( exc_si, item.getp() );
+ state->excludes_.push_back( exc_si );
+ }
+
+ state->callback_.set_tokens( state->tokens_ );
+ state->langs_.push( lang );
+ state->tokenizers_.push( tokenizer.release() );
+
+ while ( true ) {
+ if ( state->tokens_.empty() ) {
+ if ( state->includes_.empty() )
+ break;
+
+ store::Item_t inc( state->includes_.front() );
+ state->includes_.pop_front();
+ if ( inc.isNull() ) { // sentinel
+ state->langs_.pop();
+ Tokenizer::ptr deleter( ztd::pop_stack( state->tokenizers_ ) );
+ continue;
+ }
+
+ store::Item_t inc_si;
+ GENV_STORE.getStructuralInformation( inc_si, inc.getp() );
+ bool excluded = false;
+ FOR_EACH( vector<store::Item_t>, exc, state->excludes_ ) {
+ if ( inc_si->equals( *exc ) || (*exc)->isInSubtreeOf( inc_si ) ) {
+ excluded = true;
+ break;
+ }
+ }
+ if ( excluded )
+ continue;
+
+ bool add_sentinel = false;
+ switch ( inc->getNodeKind() ) {
+ case store::StoreConsts::elementNode:
+ ++state->t_state_.para;
+ if ( find_lang_attribute( *inc, &lang ) ) {
+ state->langs_.push( lang );
+ tokenizer = get_tokenizer( lang, &state->t_state_, loc );
+ state->tokenizers_.push( tokenizer.release() );
+ add_sentinel = true;
+ }
+ // no break;
+ case store::StoreConsts::documentNode: {
+ list<store::Item_t>::iterator pos = state->includes_.begin();
+ store::Iterator_t i = inc->getChildren();
+ i->open();
+ for ( store::Item_t child; i->next( child ); ) {
+ switch ( child->getNodeKind() ) {
+ case store::StoreConsts::attributeNode:
+ case store::StoreConsts::commentNode:
+ case store::StoreConsts::piNode:
+ continue; // never include these implicitly
+ default:
+ pos = state->includes_.insert( pos, child );
+ ++pos;
+ }
+ }
+ i->close();
+ if ( add_sentinel ) // sentinel
+ state->includes_.insert( pos, store::Item_t() );
+ continue;
+ }
+
+ case store::StoreConsts::attributeNode:
+ case store::StoreConsts::commentNode:
+ case store::StoreConsts::piNode:
+ // tokenize these because they were included explicitly
+ case store::StoreConsts::textNode: {
+ zstring const s( inc->getStringValue() );
+ Item const temp( inc.getp() );
+ state->tokenizers_.top()->tokenize_string(
+ s.data(), s.size(), state->langs_.top(), false, state->callback_,
+ &temp
+ );
+ break;
+ }
+
+ default:
+ break;
+ } // switch
+ continue;
+ } // if ( state->tokens_.empty() )
+
+ make_token_element(
+ state->tokens_.front(), state->token_qnames_, result
+ );
+ state->tokens_.pop_front();
+ STACK_PUSH( true, state );
+ } // while
+
+ STACK_END( state );
+}
+
+void TokenizeNodesIterator::resetImpl( PlanState &plan_state ) const {
+ NaryBaseIterator<TokenizeNodesIterator,TokenizeNodesIteratorState>::
+ resetImpl( plan_state );
+ TokenizeNodesIteratorState *const state =
+ StateTraitsImpl<TokenizeNodesIteratorState>::getState(
+ plan_state, this->theStateOffset
+ );
+ state->doc_tokens_->reset();
}
///////////////////////////////////////////////////////////////////////////////
@@ -689,7 +810,6 @@
Tokenizer::ptr tokenizer;
store::Item_t type_name;
Tokenizer::Properties props;
- TokenizerProvider const *tokenizer_provider;
zstring value_string;
PlanIteratorState *state;
@@ -704,15 +824,7 @@
lang = get_lang_from( sctx );
}
- tokenizer_provider = GENV_STORE.getTokenizerProvider();
- ZORBA_ASSERT( tokenizer_provider );
- if ( !tokenizer_provider->getTokenizer( lang, &t_state, &tokenizer ) )
- throw XQUERY_EXCEPTION(
- err::FTST0009 /* lang not supported */,
- ERROR_PARAMS(
- iso639_1::string_of[ lang ], ZED( FTST0009_BadTokenizerLang )
- )
- );
+ tokenizer = get_tokenizer( lang, &t_state, loc );
tokenizer->properties( &props );
GENV_ITEMFACTORY->createQName(
@@ -840,19 +952,8 @@
}
{ // local scope
- TokenizerProvider const *const tokenizer_provider =
- GENV_STORE.getTokenizerProvider();
- ZORBA_ASSERT( tokenizer_provider );
Tokenizer::State t_state;
- Tokenizer::ptr tokenizer;
- if ( !tokenizer_provider->getTokenizer( lang, &t_state, &tokenizer ) )
- throw XQUERY_EXCEPTION(
- err::FTST0009 /* lang not supported */,
- ERROR_PARAMS(
- iso639_1::string_of[ lang ], ZED( FTST0009_BadTokenizerLang )
- )
- );
-
+ Tokenizer::ptr const tokenizer( get_tokenizer( lang, &t_state, loc ) );
TokenizeStringIteratorCallback callback;
tokenizer->tokenize_string(
value_string.data(), value_string.size(), lang, false, callback
=== added file 'src/runtime/full_text/ft_module_util.cpp'
--- src/runtime/full_text/ft_module_util.cpp 1970-01-01 00:00:00 +0000
+++ src/runtime/full_text/ft_module_util.cpp 2012-06-29 16:44:26 +0000
@@ -0,0 +1,57 @@
+/*
+ * Copyright 2006-2008 The FLWOR Foundation.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "api/unmarshaller.h"
+#include "context/static_context.h"
+#include "store/api/item_factory.h"
+#include "system/globalenv.h"
+
+#include "ft_module_util.h"
+
+using namespace std;
+using namespace zorba::locale;
+
+namespace zorba {
+
+///////////////////////////////////////////////////////////////////////////////
+
+void TokenizeNodesCallback::token( char const *utf8_s, size_type utf8_len,
+ iso639_1::type lang, size_type token_no,
+ size_type sent_no, size_type para_no,
+ Item const *api_item ) {
+ store::Item const *const item = Unmarshaller::getInternalItem( *api_item );
+ tokens_->push_back(
+ FTToken( utf8_s, utf8_len, token_no, sent_no, para_no, item )
+ );
+}
+
+///////////////////////////////////////////////////////////////////////////////
+
+TokenQNames::TokenQNames() {
+ GENV_ITEMFACTORY->createQName(
+ token, static_context::ZORBA_FULL_TEXT_FN_NS, "", "token"
+ );
+ GENV_ITEMFACTORY->createQName( lang, "", "", "lang" );
+ GENV_ITEMFACTORY->createQName( paragraph, "", "", "paragraph" );
+ GENV_ITEMFACTORY->createQName( sentence, "", "", "sentence" );
+ GENV_ITEMFACTORY->createQName( value, "", "", "value" );
+ GENV_ITEMFACTORY->createQName( node_ref, "", "", "node-ref" );
+}
+
+///////////////////////////////////////////////////////////////////////////////
+
+} // namespace zorba
+/* vim:set et sw=2 ts=2: */
=== added file 'src/runtime/full_text/ft_module_util.h'
--- src/runtime/full_text/ft_module_util.h 1970-01-01 00:00:00 +0000
+++ src/runtime/full_text/ft_module_util.h 2012-06-29 16:44:26 +0000
@@ -0,0 +1,80 @@
+/*
+ * Copyright 2006-2008 The FLWOR Foundation.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef ZORBA_FT_MODULE_UTIL_H
+#define ZORBA_FT_MODULE_UTIL_H
+
+//
+// The reason this header (and related .cpp) are necessary (instead of just
+// puting this code into ft_module.h/.cpp directly) is because this header
+// needs to be #include'd into the .cpp generated from the ft_module.xml file.
+//
+
+#include <zorba/tokenizer.h>
+
+#include <deque>
+
+#include "store/api/item.h"
+#include "util/cxx_util.h"
+#include "zorbatypes/ft_token.h"
+
+#include "ft_module_util.h"
+
+namespace zorba {
+
+///////////////////////////////////////////////////////////////////////////////
+
+/**
+ * A %TokenizeNodesCallback is-a Tokenizer::Callback that's used exclusively by
+ * the TokenizeNodesIterator that implements the ft:tokenize-nodes() full-text
+ * module function.
+ */
+class TokenizeNodesCallback : public Tokenizer::Callback {
+public:
+ TokenizeNodesCallback() : tokens_( nullptr ) { }
+ TokenizeNodesCallback( std::deque<FTToken> &tokens ) : tokens_( &tokens ) { }
+
+ void set_tokens( std::deque<FTToken> &tokens ) {
+ tokens_ = &tokens;
+ }
+
+ // inherited
+ void token( char const *utf8_s, size_type utf8_len,
+ locale::iso639_1::type lang, size_type token_no,
+ size_type sent_no, size_type para_no, Item const *item = 0 );
+
+private:
+ std::deque<FTToken> *tokens_;
+};
+
+///////////////////////////////////////////////////////////////////////////////
+
+struct TokenQNames {
+ store::Item_t token;
+ store::Item_t lang;
+ store::Item_t paragraph;
+ store::Item_t sentence;
+ store::Item_t value;
+ store::Item_t node_ref;
+
+ TokenQNames();
+};
+
+///////////////////////////////////////////////////////////////////////////////
+
+} // namespace zorba
+#endif /* ZORBA_FT_MODULE_UTIL_H */
+/* vim:set et sw=2 ts=2: */
=== modified file 'src/runtime/full_text/ft_util.cpp'
--- src/runtime/full_text/ft_util.cpp 2012-04-27 17:07:47 +0000
+++ src/runtime/full_text/ft_util.cpp 2012-06-29 16:44:26 +0000
@@ -19,14 +19,38 @@
#include <stdexcept>
#include "diagnostics/xquery_diagnostics.h"
+#include "zorbamisc/ns_consts.h"
#include "zorbatypes/numconversions.h"
+#include "zorbautils/locale.h"
#include "ft_util.h"
+using namespace zorba::locale;
+
namespace zorba {
///////////////////////////////////////////////////////////////////////////////
+bool find_lang_attribute( store::Item const &item, iso639_1::type *lang ) {
+ bool found_lang = false;
+ if ( item.getNodeKind() == store::StoreConsts::elementNode ) {
+ store::Iterator_t i( item.getAttributes() );
+ i->open();
+ for ( store::Item_t attr; i->next( attr ); ) {
+ store::Item const *const qname = attr->getNodeName();
+ if ( qname &&
+ qname->getLocalName() == "lang" &&
+ qname->getNamespace() == XML_NS ) {
+ *lang = locale::find_lang( attr->getStringValue().c_str() );
+ found_lang = true;
+ break;
+ }
+ }
+ i->close();
+ }
+ return found_lang;
+}
+
ft_int to_ft_int( xs_integer const &i ) {
try {
return to_xs_unsignedInt( i );
=== modified file 'src/runtime/full_text/ft_util.h'
--- src/runtime/full_text/ft_util.h 2012-06-28 04:14:03 +0000
+++ src/runtime/full_text/ft_util.h 2012-06-29 16:44:26 +0000
@@ -17,11 +17,13 @@
#ifndef ZORBA_FULL_TEXT_UTIL_H
#define ZORBA_FULL_TEXT_UTIL_H
+#include <zorba/item.h>
#include <zorba/locale.h>
#include "compiler/expression/ftnode.h"
+#include "store/api/item.h"
+#include "util/cxx_util.h"
#include "zorbatypes/schema_types.h"
-#include "util/cxx_util.h"
#include "ft_match.h"
@@ -44,6 +46,16 @@
////////// Functions //////////////////////////////////////////////////////////
/**
+ * TODO
+ *
+ * @param item TODO
+ * @param lang TODO
+ * @return Returns \c true only if TODO
+ */
+bool find_lang_attribute( store::Item const &item,
+ locale::iso639_1::type *lang );
+
+/**
* Gets the language from the given ftmatch_options, if any.
*
* @param options The ftmatch_options to get the language from. This may be \c
@@ -98,6 +110,8 @@
*/
ft_int to_ft_int( xs_integer const &i );
+///////////////////////////////////////////////////////////////////////////////
+
} // namespace zorba
#endif /* ZORBA_FULL_TEXT_UTIL_H */
/* vim:set et sw=2 ts=2: */
=== added file 'src/runtime/full_text/pregenerated/ft_module.cpp'
--- src/runtime/full_text/pregenerated/ft_module.cpp 1970-01-01 00:00:00 +0000
+++ src/runtime/full_text/pregenerated/ft_module.cpp 2012-06-29 16:44:26 +0000
@@ -0,0 +1,506 @@
+/*
+ * Copyright 2006-2008 The FLWOR Foundation.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// ******************************************
+// * *
+// * THIS IS A GENERATED FILE. DO NOT EDIT! *
+// * SEE .xml FILE WITH SAME NAME *
+// * *
+// ******************************************
+
+#include "stdafx.h"
+#include "zorbatypes/rchandle.h"
+#include "zorbatypes/zstring.h"
+#include "runtime/visitors/planiter_visitor.h"
+#include "runtime/full_text/ft_module.h"
+#include "system/globalenv.h"
+
+
+#include "store/api/iterator.h"
+
+namespace zorba {
+
+#ifndef ZORBA_NO_FULL_TEXT
+// <CurrentCompareOptionsIterator>
+SERIALIZABLE_CLASS_VERSIONS(CurrentCompareOptionsIterator)
+
+void CurrentCompareOptionsIterator::serialize(::zorba::serialization::Archiver& ar)
+{
+ serialize_baseclass(ar,
+ (NaryBaseIterator<CurrentCompareOptionsIterator, PlanIteratorState>*)this);
+}
+
+
+void CurrentCompareOptionsIterator::accept(PlanIterVisitor& v) const
+{
+ v.beginVisit(*this);
+
+ std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+ std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+ for ( ; lIter != lEnd; ++lIter ){
+ (*lIter)->accept(v);
+ }
+
+ v.endVisit(*this);
+}
+
+CurrentCompareOptionsIterator::~CurrentCompareOptionsIterator() {}
+
+// </CurrentCompareOptionsIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <CurrentLangIterator>
+SERIALIZABLE_CLASS_VERSIONS(CurrentLangIterator)
+
+void CurrentLangIterator::serialize(::zorba::serialization::Archiver& ar)
+{
+ serialize_baseclass(ar,
+ (NaryBaseIterator<CurrentLangIterator, PlanIteratorState>*)this);
+}
+
+
+void CurrentLangIterator::accept(PlanIterVisitor& v) const
+{
+ v.beginVisit(*this);
+
+ std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+ std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+ for ( ; lIter != lEnd; ++lIter ){
+ (*lIter)->accept(v);
+ }
+
+ v.endVisit(*this);
+}
+
+CurrentLangIterator::~CurrentLangIterator() {}
+
+// </CurrentLangIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <HostLangIterator>
+SERIALIZABLE_CLASS_VERSIONS(HostLangIterator)
+
+void HostLangIterator::serialize(::zorba::serialization::Archiver& ar)
+{
+ serialize_baseclass(ar,
+ (NaryBaseIterator<HostLangIterator, PlanIteratorState>*)this);
+}
+
+
+void HostLangIterator::accept(PlanIterVisitor& v) const
+{
+ v.beginVisit(*this);
+
+ std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+ std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+ for ( ; lIter != lEnd; ++lIter ){
+ (*lIter)->accept(v);
+ }
+
+ v.endVisit(*this);
+}
+
+HostLangIterator::~HostLangIterator() {}
+
+// </HostLangIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <IsStemLangSupportedIterator>
+SERIALIZABLE_CLASS_VERSIONS(IsStemLangSupportedIterator)
+
+void IsStemLangSupportedIterator::serialize(::zorba::serialization::Archiver& ar)
+{
+ serialize_baseclass(ar,
+ (NaryBaseIterator<IsStemLangSupportedIterator, PlanIteratorState>*)this);
+}
+
+
+void IsStemLangSupportedIterator::accept(PlanIterVisitor& v) const
+{
+ v.beginVisit(*this);
+
+ std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+ std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+ for ( ; lIter != lEnd; ++lIter ){
+ (*lIter)->accept(v);
+ }
+
+ v.endVisit(*this);
+}
+
+IsStemLangSupportedIterator::~IsStemLangSupportedIterator() {}
+
+// </IsStemLangSupportedIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <IsStopWordIterator>
+SERIALIZABLE_CLASS_VERSIONS(IsStopWordIterator)
+
+void IsStopWordIterator::serialize(::zorba::serialization::Archiver& ar)
+{
+ serialize_baseclass(ar,
+ (NaryBaseIterator<IsStopWordIterator, PlanIteratorState>*)this);
+}
+
+
+void IsStopWordIterator::accept(PlanIterVisitor& v) const
+{
+ v.beginVisit(*this);
+
+ std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+ std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+ for ( ; lIter != lEnd; ++lIter ){
+ (*lIter)->accept(v);
+ }
+
+ v.endVisit(*this);
+}
+
+IsStopWordIterator::~IsStopWordIterator() {}
+
+// </IsStopWordIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <IsStopWordLangSupportedIterator>
+SERIALIZABLE_CLASS_VERSIONS(IsStopWordLangSupportedIterator)
+
+void IsStopWordLangSupportedIterator::serialize(::zorba::serialization::Archiver& ar)
+{
+ serialize_baseclass(ar,
+ (NaryBaseIterator<IsStopWordLangSupportedIterator, PlanIteratorState>*)this);
+}
+
+
+void IsStopWordLangSupportedIterator::accept(PlanIterVisitor& v) const
+{
+ v.beginVisit(*this);
+
+ std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+ std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+ for ( ; lIter != lEnd; ++lIter ){
+ (*lIter)->accept(v);
+ }
+
+ v.endVisit(*this);
+}
+
+IsStopWordLangSupportedIterator::~IsStopWordLangSupportedIterator() {}
+
+// </IsStopWordLangSupportedIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <IsThesaurusLangSupportedIterator>
+SERIALIZABLE_CLASS_VERSIONS(IsThesaurusLangSupportedIterator)
+
+void IsThesaurusLangSupportedIterator::serialize(::zorba::serialization::Archiver& ar)
+{
+ serialize_baseclass(ar,
+ (NaryBaseIterator<IsThesaurusLangSupportedIterator, PlanIteratorState>*)this);
+}
+
+
+void IsThesaurusLangSupportedIterator::accept(PlanIterVisitor& v) const
+{
+ v.beginVisit(*this);
+
+ std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+ std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+ for ( ; lIter != lEnd; ++lIter ){
+ (*lIter)->accept(v);
+ }
+
+ v.endVisit(*this);
+}
+
+IsThesaurusLangSupportedIterator::~IsThesaurusLangSupportedIterator() {}
+
+// </IsThesaurusLangSupportedIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <IsTokenizerLangSupportedIterator>
+SERIALIZABLE_CLASS_VERSIONS(IsTokenizerLangSupportedIterator)
+
+void IsTokenizerLangSupportedIterator::serialize(::zorba::serialization::Archiver& ar)
+{
+ serialize_baseclass(ar,
+ (NaryBaseIterator<IsTokenizerLangSupportedIterator, PlanIteratorState>*)this);
+}
+
+
+void IsTokenizerLangSupportedIterator::accept(PlanIterVisitor& v) const
+{
+ v.beginVisit(*this);
+
+ std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+ std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+ for ( ; lIter != lEnd; ++lIter ){
+ (*lIter)->accept(v);
+ }
+
+ v.endVisit(*this);
+}
+
+IsTokenizerLangSupportedIterator::~IsTokenizerLangSupportedIterator() {}
+
+// </IsTokenizerLangSupportedIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <StemIterator>
+SERIALIZABLE_CLASS_VERSIONS(StemIterator)
+
+void StemIterator::serialize(::zorba::serialization::Archiver& ar)
+{
+ serialize_baseclass(ar,
+ (NaryBaseIterator<StemIterator, PlanIteratorState>*)this);
+}
+
+
+void StemIterator::accept(PlanIterVisitor& v) const
+{
+ v.beginVisit(*this);
+
+ std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+ std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+ for ( ; lIter != lEnd; ++lIter ){
+ (*lIter)->accept(v);
+ }
+
+ v.endVisit(*this);
+}
+
+StemIterator::~StemIterator() {}
+
+// </StemIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <StripDiacriticsIterator>
+SERIALIZABLE_CLASS_VERSIONS(StripDiacriticsIterator)
+
+void StripDiacriticsIterator::serialize(::zorba::serialization::Archiver& ar)
+{
+ serialize_baseclass(ar,
+ (NaryBaseIterator<StripDiacriticsIterator, PlanIteratorState>*)this);
+}
+
+
+void StripDiacriticsIterator::accept(PlanIterVisitor& v) const
+{
+ v.beginVisit(*this);
+
+ std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+ std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+ for ( ; lIter != lEnd; ++lIter ){
+ (*lIter)->accept(v);
+ }
+
+ v.endVisit(*this);
+}
+
+StripDiacriticsIterator::~StripDiacriticsIterator() {}
+
+// </StripDiacriticsIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <ThesaurusLookupIterator>
+SERIALIZABLE_CLASS_VERSIONS(ThesaurusLookupIterator)
+
+void ThesaurusLookupIterator::serialize(::zorba::serialization::Archiver& ar)
+{
+ serialize_baseclass(ar,
+ (NaryBaseIterator<ThesaurusLookupIterator, ThesaurusLookupIteratorState>*)this);
+}
+
+
+void ThesaurusLookupIterator::accept(PlanIterVisitor& v) const
+{
+ v.beginVisit(*this);
+
+ std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+ std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+ for ( ; lIter != lEnd; ++lIter ){
+ (*lIter)->accept(v);
+ }
+
+ v.endVisit(*this);
+}
+
+ThesaurusLookupIterator::~ThesaurusLookupIterator() {}
+
+ThesaurusLookupIteratorState::ThesaurusLookupIteratorState() {}
+
+ThesaurusLookupIteratorState::~ThesaurusLookupIteratorState() {}
+
+
+void ThesaurusLookupIteratorState::reset(PlanState& planState) {
+ PlanIteratorState::reset(planState);
+}
+// </ThesaurusLookupIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <TokenizeNodeIterator>
+SERIALIZABLE_CLASS_VERSIONS(TokenizeNodeIterator)
+
+void TokenizeNodeIterator::serialize(::zorba::serialization::Archiver& ar)
+{
+ serialize_baseclass(ar,
+ (NaryBaseIterator<TokenizeNodeIterator, TokenizeNodeIteratorState>*)this);
+}
+
+
+void TokenizeNodeIterator::accept(PlanIterVisitor& v) const
+{
+ v.beginVisit(*this);
+
+ std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+ std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+ for ( ; lIter != lEnd; ++lIter ){
+ (*lIter)->accept(v);
+ }
+
+ v.endVisit(*this);
+}
+
+TokenizeNodeIterator::~TokenizeNodeIterator() {}
+
+TokenizeNodeIteratorState::TokenizeNodeIteratorState() {}
+
+TokenizeNodeIteratorState::~TokenizeNodeIteratorState() {}
+
+
+void TokenizeNodeIteratorState::reset(PlanState& planState) {
+ PlanIteratorState::reset(planState);
+}
+// </TokenizeNodeIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <TokenizeNodesIterator>
+SERIALIZABLE_CLASS_VERSIONS(TokenizeNodesIterator)
+
+void TokenizeNodesIterator::serialize(::zorba::serialization::Archiver& ar)
+{
+ serialize_baseclass(ar,
+ (NaryBaseIterator<TokenizeNodesIterator, TokenizeNodesIteratorState>*)this);
+}
+
+
+void TokenizeNodesIterator::accept(PlanIterVisitor& v) const
+{
+ v.beginVisit(*this);
+
+ std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+ std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+ for ( ; lIter != lEnd; ++lIter ){
+ (*lIter)->accept(v);
+ }
+
+ v.endVisit(*this);
+}
+
+TokenizeNodesIterator::~TokenizeNodesIterator() {}
+
+TokenizeNodesIteratorState::TokenizeNodesIteratorState() {}
+
+TokenizeNodesIteratorState::~TokenizeNodesIteratorState() {}
+
+
+void TokenizeNodesIteratorState::reset(PlanState& planState) {
+ PlanIteratorState::reset(planState);
+}
+// </TokenizeNodesIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <TokenizerPropertiesIterator>
+SERIALIZABLE_CLASS_VERSIONS(TokenizerPropertiesIterator)
+
+void TokenizerPropertiesIterator::serialize(::zorba::serialization::Archiver& ar)
+{
+ serialize_baseclass(ar,
+ (NaryBaseIterator<TokenizerPropertiesIterator, PlanIteratorState>*)this);
+}
+
+
+void TokenizerPropertiesIterator::accept(PlanIterVisitor& v) const
+{
+ v.beginVisit(*this);
+
+ std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+ std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+ for ( ; lIter != lEnd; ++lIter ){
+ (*lIter)->accept(v);
+ }
+
+ v.endVisit(*this);
+}
+
+TokenizerPropertiesIterator::~TokenizerPropertiesIterator() {}
+
+// </TokenizerPropertiesIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <TokenizeStringIterator>
+SERIALIZABLE_CLASS_VERSIONS(TokenizeStringIterator)
+
+void TokenizeStringIterator::serialize(::zorba::serialization::Archiver& ar)
+{
+ serialize_baseclass(ar,
+ (NaryBaseIterator<TokenizeStringIterator, TokenizeStringIteratorState>*)this);
+}
+
+
+void TokenizeStringIterator::accept(PlanIterVisitor& v) const
+{
+ v.beginVisit(*this);
+
+ std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+ std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+ for ( ; lIter != lEnd; ++lIter ){
+ (*lIter)->accept(v);
+ }
+
+ v.endVisit(*this);
+}
+
+TokenizeStringIterator::~TokenizeStringIterator() {}
+
+TokenizeStringIteratorState::TokenizeStringIteratorState() {}
+
+TokenizeStringIteratorState::~TokenizeStringIteratorState() {}
+
+
+void TokenizeStringIteratorState::reset(PlanState& planState) {
+ PlanIteratorState::reset(planState);
+}
+// </TokenizeStringIterator>
+
+#endif
+
+}
+
+
=== removed file 'src/runtime/full_text/pregenerated/ft_module.cpp'
--- src/runtime/full_text/pregenerated/ft_module.cpp 2012-05-22 19:09:20 +0000
+++ src/runtime/full_text/pregenerated/ft_module.cpp 1970-01-01 00:00:00 +0000
@@ -1,463 +0,0 @@
-/*
- * Copyright 2006-2008 The FLWOR Foundation.
- *
- * Licensed under the Apache License, Version 2.0 (the "License");
- * you may not use this file except in compliance with the License.
- * You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-// ******************************************
-// * *
-// * THIS IS A GENERATED FILE. DO NOT EDIT! *
-// * SEE .xml FILE WITH SAME NAME *
-// * *
-// ******************************************
-
-#include "stdafx.h"
-#include "zorbatypes/rchandle.h"
-#include "zorbatypes/zstring.h"
-#include "runtime/visitors/planiter_visitor.h"
-#include "runtime/full_text/ft_module.h"
-#include "system/globalenv.h"
-
-
-#include "store/api/iterator.h"
-
-namespace zorba {
-
-#ifndef ZORBA_NO_FULL_TEXT
-// <CurrentCompareOptionsIterator>
-SERIALIZABLE_CLASS_VERSIONS(CurrentCompareOptionsIterator)
-
-void CurrentCompareOptionsIterator::serialize(::zorba::serialization::Archiver& ar)
-{
- serialize_baseclass(ar,
- (NaryBaseIterator<CurrentCompareOptionsIterator, PlanIteratorState>*)this);
-}
-
-
-void CurrentCompareOptionsIterator::accept(PlanIterVisitor& v) const
-{
- v.beginVisit(*this);
-
- std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
- std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
- for ( ; lIter != lEnd; ++lIter ){
- (*lIter)->accept(v);
- }
-
- v.endVisit(*this);
-}
-
-CurrentCompareOptionsIterator::~CurrentCompareOptionsIterator() {}
-
-// </CurrentCompareOptionsIterator>
-
-#endif
-#ifndef ZORBA_NO_FULL_TEXT
-// <CurrentLangIterator>
-SERIALIZABLE_CLASS_VERSIONS(CurrentLangIterator)
-
-void CurrentLangIterator::serialize(::zorba::serialization::Archiver& ar)
-{
- serialize_baseclass(ar,
- (NaryBaseIterator<CurrentLangIterator, PlanIteratorState>*)this);
-}
-
-
-void CurrentLangIterator::accept(PlanIterVisitor& v) const
-{
- v.beginVisit(*this);
-
- std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
- std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
- for ( ; lIter != lEnd; ++lIter ){
- (*lIter)->accept(v);
- }
-
- v.endVisit(*this);
-}
-
-CurrentLangIterator::~CurrentLangIterator() {}
-
-// </CurrentLangIterator>
-
-#endif
-#ifndef ZORBA_NO_FULL_TEXT
-// <HostLangIterator>
-SERIALIZABLE_CLASS_VERSIONS(HostLangIterator)
-
-void HostLangIterator::serialize(::zorba::serialization::Archiver& ar)
-{
- serialize_baseclass(ar,
- (NaryBaseIterator<HostLangIterator, PlanIteratorState>*)this);
-}
-
-
-void HostLangIterator::accept(PlanIterVisitor& v) const
-{
- v.beginVisit(*this);
-
- std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
- std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
- for ( ; lIter != lEnd; ++lIter ){
- (*lIter)->accept(v);
- }
-
- v.endVisit(*this);
-}
-
-HostLangIterator::~HostLangIterator() {}
-
-// </HostLangIterator>
-
-#endif
-#ifndef ZORBA_NO_FULL_TEXT
-// <IsStemLangSupportedIterator>
-SERIALIZABLE_CLASS_VERSIONS(IsStemLangSupportedIterator)
-
-void IsStemLangSupportedIterator::serialize(::zorba::serialization::Archiver& ar)
-{
- serialize_baseclass(ar,
- (NaryBaseIterator<IsStemLangSupportedIterator, PlanIteratorState>*)this);
-}
-
-
-void IsStemLangSupportedIterator::accept(PlanIterVisitor& v) const
-{
- v.beginVisit(*this);
-
- std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
- std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
- for ( ; lIter != lEnd; ++lIter ){
- (*lIter)->accept(v);
- }
-
- v.endVisit(*this);
-}
-
-IsStemLangSupportedIterator::~IsStemLangSupportedIterator() {}
-
-// </IsStemLangSupportedIterator>
-
-#endif
-#ifndef ZORBA_NO_FULL_TEXT
-// <IsStopWordIterator>
-SERIALIZABLE_CLASS_VERSIONS(IsStopWordIterator)
-
-void IsStopWordIterator::serialize(::zorba::serialization::Archiver& ar)
-{
- serialize_baseclass(ar,
- (NaryBaseIterator<IsStopWordIterator, PlanIteratorState>*)this);
-}
-
-
-void IsStopWordIterator::accept(PlanIterVisitor& v) const
-{
- v.beginVisit(*this);
-
- std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
- std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
- for ( ; lIter != lEnd; ++lIter ){
- (*lIter)->accept(v);
- }
-
- v.endVisit(*this);
-}
-
-IsStopWordIterator::~IsStopWordIterator() {}
-
-// </IsStopWordIterator>
-
-#endif
-#ifndef ZORBA_NO_FULL_TEXT
-// <IsStopWordLangSupportedIterator>
-SERIALIZABLE_CLASS_VERSIONS(IsStopWordLangSupportedIterator)
-
-void IsStopWordLangSupportedIterator::serialize(::zorba::serialization::Archiver& ar)
-{
- serialize_baseclass(ar,
- (NaryBaseIterator<IsStopWordLangSupportedIterator, PlanIteratorState>*)this);
-}
-
-
-void IsStopWordLangSupportedIterator::accept(PlanIterVisitor& v) const
-{
- v.beginVisit(*this);
-
- std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
- std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
- for ( ; lIter != lEnd; ++lIter ){
- (*lIter)->accept(v);
- }
-
- v.endVisit(*this);
-}
-
-IsStopWordLangSupportedIterator::~IsStopWordLangSupportedIterator() {}
-
-// </IsStopWordLangSupportedIterator>
-
-#endif
-#ifndef ZORBA_NO_FULL_TEXT
-// <IsThesaurusLangSupportedIterator>
-SERIALIZABLE_CLASS_VERSIONS(IsThesaurusLangSupportedIterator)
-
-void IsThesaurusLangSupportedIterator::serialize(::zorba::serialization::Archiver& ar)
-{
- serialize_baseclass(ar,
- (NaryBaseIterator<IsThesaurusLangSupportedIterator, PlanIteratorState>*)this);
-}
-
-
-void IsThesaurusLangSupportedIterator::accept(PlanIterVisitor& v) const
-{
- v.beginVisit(*this);
-
- std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
- std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
- for ( ; lIter != lEnd; ++lIter ){
- (*lIter)->accept(v);
- }
-
- v.endVisit(*this);
-}
-
-IsThesaurusLangSupportedIterator::~IsThesaurusLangSupportedIterator() {}
-
-// </IsThesaurusLangSupportedIterator>
-
-#endif
-#ifndef ZORBA_NO_FULL_TEXT
-// <IsTokenizerLangSupportedIterator>
-SERIALIZABLE_CLASS_VERSIONS(IsTokenizerLangSupportedIterator)
-
-void IsTokenizerLangSupportedIterator::serialize(::zorba::serialization::Archiver& ar)
-{
- serialize_baseclass(ar,
- (NaryBaseIterator<IsTokenizerLangSupportedIterator, PlanIteratorState>*)this);
-}
-
-
-void IsTokenizerLangSupportedIterator::accept(PlanIterVisitor& v) const
-{
- v.beginVisit(*this);
-
- std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
- std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
- for ( ; lIter != lEnd; ++lIter ){
- (*lIter)->accept(v);
- }
-
- v.endVisit(*this);
-}
-
-IsTokenizerLangSupportedIterator::~IsTokenizerLangSupportedIterator() {}
-
-// </IsTokenizerLangSupportedIterator>
-
-#endif
-#ifndef ZORBA_NO_FULL_TEXT
-// <StemIterator>
-SERIALIZABLE_CLASS_VERSIONS(StemIterator)
-
-void StemIterator::serialize(::zorba::serialization::Archiver& ar)
-{
- serialize_baseclass(ar,
- (NaryBaseIterator<StemIterator, PlanIteratorState>*)this);
-}
-
-
-void StemIterator::accept(PlanIterVisitor& v) const
-{
- v.beginVisit(*this);
-
- std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
- std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
- for ( ; lIter != lEnd; ++lIter ){
- (*lIter)->accept(v);
- }
-
- v.endVisit(*this);
-}
-
-StemIterator::~StemIterator() {}
-
-// </StemIterator>
-
-#endif
-#ifndef ZORBA_NO_FULL_TEXT
-// <StripDiacriticsIterator>
-SERIALIZABLE_CLASS_VERSIONS(StripDiacriticsIterator)
-
-void StripDiacriticsIterator::serialize(::zorba::serialization::Archiver& ar)
-{
- serialize_baseclass(ar,
- (NaryBaseIterator<StripDiacriticsIterator, PlanIteratorState>*)this);
-}
-
-
-void StripDiacriticsIterator::accept(PlanIterVisitor& v) const
-{
- v.beginVisit(*this);
-
- std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
- std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
- for ( ; lIter != lEnd; ++lIter ){
- (*lIter)->accept(v);
- }
-
- v.endVisit(*this);
-}
-
-StripDiacriticsIterator::~StripDiacriticsIterator() {}
-
-// </StripDiacriticsIterator>
-
-#endif
-#ifndef ZORBA_NO_FULL_TEXT
-// <ThesaurusLookupIterator>
-SERIALIZABLE_CLASS_VERSIONS(ThesaurusLookupIterator)
-
-void ThesaurusLookupIterator::serialize(::zorba::serialization::Archiver& ar)
-{
- serialize_baseclass(ar,
- (NaryBaseIterator<ThesaurusLookupIterator, ThesaurusLookupIteratorState>*)this);
-}
-
-
-void ThesaurusLookupIterator::accept(PlanIterVisitor& v) const
-{
- v.beginVisit(*this);
-
- std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
- std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
- for ( ; lIter != lEnd; ++lIter ){
- (*lIter)->accept(v);
- }
-
- v.endVisit(*this);
-}
-
-ThesaurusLookupIterator::~ThesaurusLookupIterator() {}
-
-ThesaurusLookupIteratorState::ThesaurusLookupIteratorState() {}
-
-ThesaurusLookupIteratorState::~ThesaurusLookupIteratorState() {}
-
-
-void ThesaurusLookupIteratorState::reset(PlanState& planState) {
- PlanIteratorState::reset(planState);
-}
-// </ThesaurusLookupIterator>
-
-#endif
-#ifndef ZORBA_NO_FULL_TEXT
-// <TokenizeNodeIterator>
-SERIALIZABLE_CLASS_VERSIONS(TokenizeNodeIterator)
-
-
-void TokenizeNodeIterator::accept(PlanIterVisitor& v) const
-{
- v.beginVisit(*this);
-
- std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
- std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
- for ( ; lIter != lEnd; ++lIter ){
- (*lIter)->accept(v);
- }
-
- v.endVisit(*this);
-}
-
-TokenizeNodeIterator::~TokenizeNodeIterator() {}
-
-TokenizeNodeIteratorState::TokenizeNodeIteratorState() {}
-
-TokenizeNodeIteratorState::~TokenizeNodeIteratorState() {}
-
-
-void TokenizeNodeIteratorState::reset(PlanState& planState) {
- PlanIteratorState::reset(planState);
-}
-// </TokenizeNodeIterator>
-
-#endif
-#ifndef ZORBA_NO_FULL_TEXT
-// <TokenizerPropertiesIterator>
-SERIALIZABLE_CLASS_VERSIONS(TokenizerPropertiesIterator)
-
-void TokenizerPropertiesIterator::serialize(::zorba::serialization::Archiver& ar)
-{
- serialize_baseclass(ar,
- (NaryBaseIterator<TokenizerPropertiesIterator, PlanIteratorState>*)this);
-}
-
-
-void TokenizerPropertiesIterator::accept(PlanIterVisitor& v) const
-{
- v.beginVisit(*this);
-
- std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
- std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
- for ( ; lIter != lEnd; ++lIter ){
- (*lIter)->accept(v);
- }
-
- v.endVisit(*this);
-}
-
-TokenizerPropertiesIterator::~TokenizerPropertiesIterator() {}
-
-// </TokenizerPropertiesIterator>
-
-#endif
-#ifndef ZORBA_NO_FULL_TEXT
-// <TokenizeStringIterator>
-SERIALIZABLE_CLASS_VERSIONS(TokenizeStringIterator)
-
-void TokenizeStringIterator::serialize(::zorba::serialization::Archiver& ar)
-{
- serialize_baseclass(ar,
- (NaryBaseIterator<TokenizeStringIterator, TokenizeStringIteratorState>*)this);
-}
-
-
-void TokenizeStringIterator::accept(PlanIterVisitor& v) const
-{
- v.beginVisit(*this);
-
- std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
- std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
- for ( ; lIter != lEnd; ++lIter ){
- (*lIter)->accept(v);
- }
-
- v.endVisit(*this);
-}
-
-TokenizeStringIterator::~TokenizeStringIterator() {}
-
-TokenizeStringIteratorState::TokenizeStringIteratorState() {}
-
-TokenizeStringIteratorState::~TokenizeStringIteratorState() {}
-
-
-void TokenizeStringIteratorState::reset(PlanState& planState) {
- PlanIteratorState::reset(planState);
-}
-// </TokenizeStringIterator>
-
-#endif
-
-}
-
-
=== modified file 'src/runtime/full_text/pregenerated/ft_module.h'
--- src/runtime/full_text/pregenerated/ft_module.h 2012-06-28 04:14:03 +0000
+++ src/runtime/full_text/pregenerated/ft_module.h 2012-06-29 16:44:26 +0000
@@ -29,6 +29,11 @@
#include "runtime/base/narybase.h"
+#include <deque>
+#include <list>
+#include <stack>
+#include <vector>
+#include "runtime/full_text/ft_module_util.h"
#include "runtime/full_text/ft_token_seq_iterator.h"
#include "runtime/full_text/thesaurus.h"
@@ -416,6 +421,7 @@
public:
store::Item_t doc_item_; //
FTTokenIterator_t doc_tokens_; //
+ TokenQNames token_qnames_; //
TokenizeNodeIteratorState();
@@ -426,13 +432,6 @@
class TokenizeNodeIterator : public NaryBaseIterator<TokenizeNodeIterator, TokenizeNodeIteratorState>
{
-protected:
- store::Item_t token_qname_; //
- store::Item_t lang_qname_; //
- store::Item_t para_qname_; //
- store::Item_t sent_qname_; //
- store::Item_t value_qname_; //
- store::Item_t ref_qname_; //
public:
SERIALIZABLE_CLASS(TokenizeNodeIterator);
@@ -445,12 +444,67 @@
static_context* sctx,
const QueryLoc& loc,
std::vector<PlanIter_t>& children)
- ;
+ :
+ NaryBaseIterator<TokenizeNodeIterator, TokenizeNodeIteratorState>(sctx, loc, children)
+ {}
virtual ~TokenizeNodeIterator();
-public:
- void initMembers();
+ void accept(PlanIterVisitor& v) const;
+
+ bool nextImpl(store::Item_t& result, PlanState& aPlanState) const;
+
+ void resetImpl(PlanState&) const;
+};
+
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+/**
+ *
+ * Author:
+ */
+class TokenizeNodesIteratorState : public PlanIteratorState
+{
+public:
+ store::Item_t doc_item_; //
+ FTTokenIterator_t doc_tokens_; //
+ TokenQNames token_qnames_; //
+ std::list<store::Item_t> includes_; //
+ std::vector<store::Item_t> excludes_; //
+ std::stack<Tokenizer*> tokenizers_; //
+ std::stack<locale::iso639_1::type> langs_; //
+ TokenizeNodesCallback callback_; //
+ Tokenizer::State t_state_; //
+ std::deque<FTToken> tokens_; //
+
+ TokenizeNodesIteratorState();
+
+ ~TokenizeNodesIteratorState();
+
+ void reset(PlanState&);
+};
+
+class TokenizeNodesIterator : public NaryBaseIterator<TokenizeNodesIterator, TokenizeNodesIteratorState>
+{
+public:
+ SERIALIZABLE_CLASS(TokenizeNodesIterator);
+
+ SERIALIZABLE_CLASS_CONSTRUCTOR2T(TokenizeNodesIterator,
+ NaryBaseIterator<TokenizeNodesIterator, TokenizeNodesIteratorState>);
+
+ void serialize( ::zorba::serialization::Archiver& ar);
+
+ TokenizeNodesIterator(
+ static_context* sctx,
+ const QueryLoc& loc,
+ std::vector<PlanIter_t>& children)
+ :
+ NaryBaseIterator<TokenizeNodesIterator, TokenizeNodesIteratorState>(sctx, loc, children)
+ {}
+
+ virtual ~TokenizeNodesIterator();
+
void accept(PlanIterVisitor& v) const;
bool nextImpl(store::Item_t& result, PlanState& aPlanState) const;
=== modified file 'src/runtime/full_text/tokenizer.cpp'
--- src/runtime/full_text/tokenizer.cpp 2012-06-28 04:14:03 +0000
+++ src/runtime/full_text/tokenizer.cpp 2012-06-29 16:44:26 +0000
@@ -21,12 +21,15 @@
#include <zorba/tokenizer.h>
#include <zorba/zorba_string.h>
+#include "api/unmarshaller.h"
#include "diagnostics/assert.h"
#include "store/api/store.h"
#include "system/globalenv.h"
#include "zorbamisc/ns_consts.h"
#include "zorbautils/locale.h"
+#include "ft_util.h"
+
using namespace zorba::locale;
namespace zorba {
@@ -38,22 +41,9 @@
}
bool Tokenizer::find_lang_attribute( Item const &item, iso639_1::type *lang ) {
- bool found_lang = false;
- if ( item.getNodeKind() == store::StoreConsts::elementNode ) {
- Iterator_t i( item.getAttributes() );
- i->open();
- for ( Item attr; i->next( attr ); ) {
- Item qname;
- if ( attr.getNodeName( qname ) &&
- qname.getLocalName() == "lang" && qname.getNamespace() == XML_NS ) {
- *lang = locale::find_lang( attr.getStringValue().c_str() );
- found_lang = true;
- break;
- }
- }
- i->close();
- }
- return found_lang;
+ return zorba::find_lang_attribute(
+ *Unmarshaller::getInternalItem( item ), lang
+ );
}
void Tokenizer::item( Item const &item, bool entering ) {
=== modified file 'src/runtime/json/jsonml_array.cpp'
--- src/runtime/json/jsonml_array.cpp 2012-06-28 04:14:03 +0000
+++ src/runtime/json/jsonml_array.cpp 2012-06-29 16:44:26 +0000
@@ -30,6 +30,7 @@
#include "util/omanip.h"
#include "util/oseparator.h"
#include "util/stl_util.h"
+#include "util/xml_util.h"
#include "jsonml_array.h"
@@ -39,20 +40,12 @@
///////////////////////////////////////////////////////////////////////////////
-static void split_name( zstring const &name, zstring *prefix, zstring *local ) {
- zstring::size_type const colon = name.find( ':' );
- if ( colon != zstring::npos ) {
- *prefix = name.substr( 0, colon );
- *local = name.substr( colon + 1 );
- if ( prefix->empty() || local->empty() )
- throw XQUERY_EXCEPTION(
- zerr::ZJPE0008_ILLEGAL_QNAME,
- ERROR_PARAMS( name )
- );
- } else {
- prefix->clear();
- *local = name;
- }
+inline void split_name( zstring const &name, zstring *prefix, zstring *local ) {
+ if ( !xml::split_name( name, prefix, local ) )
+ throw XQUERY_EXCEPTION(
+ zerr::ZJPE0008_ILLEGAL_QNAME,
+ ERROR_PARAMS( name )
+ );
}
namespace expect {
=== modified file 'src/runtime/pregenerated/iterator_enum.h'
--- src/runtime/pregenerated/iterator_enum.h 2012-06-28 21:54:08 +0000
+++ src/runtime/pregenerated/iterator_enum.h 2012-06-29 16:44:26 +0000
@@ -114,6 +114,7 @@
TYPE_StripDiacriticsIterator,
TYPE_ThesaurusLookupIterator,
TYPE_TokenizeNodeIterator,
+ TYPE_TokenizeNodesIterator,
TYPE_TokenizerPropertiesIterator,
TYPE_TokenizeStringIterator,
TYPE_FunctionNameIterator,
=== modified file 'src/runtime/spec/full_text/ft_module.xml'
--- src/runtime/spec/full_text/ft_module.xml 2012-06-28 04:14:03 +0000
+++ src/runtime/spec/full_text/ft_module.xml 2012-06-29 16:44:26 +0000
@@ -6,6 +6,12 @@
xsi:schemaLocation="http://www.zorba-xquery.com ../runtime.xsd">
<zorba:header>
+ <zorba:include form="Angle-bracket">deque</zorba:include>
+ <zorba:include form="Angle-bracket">list</zorba:include>
+ <zorba:include form="Angle-bracket">stack</zorba:include>
+ <zorba:include form="Angle-bracket">vector</zorba:include>
+ <zorba:include form="Angle-brakcet">zorba/locale.h</zorba:include>
+ <zorba:include form="Quoted">runtime/full_text/ft_module_util.h</zorba:include>
<zorba:include form="Quoted">runtime/full_text/ft_token_seq_iterator.h</zorba:include>
<zorba:include form="Quoted">runtime/full_text/thesaurus.h</zorba:include>
</zorba:header>
@@ -14,6 +20,8 @@
<zorba:include form="Quoted">store/api/iterator.h</zorba:include>
</zorba:source>
+<!--========================================================================-->
+
<zorba:iterator name="CurrentCompareOptionsIterator"
preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
</zorba:iterator>
@@ -27,6 +35,8 @@
</zorba:function>
</zorba:iterator>
+<!--========================================================================-->
+
<zorba:iterator name="HostLangIterator"
preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
<zorba:function>
@@ -36,6 +46,8 @@
</zorba:function>
</zorba:iterator>
+<!--========================================================================-->
+
<zorba:iterator name="IsStemLangSupportedIterator"
preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
<zorba:function>
@@ -46,6 +58,8 @@
</zorba:function>
</zorba:iterator>
+<!--========================================================================-->
+
<zorba:iterator name="IsStopWordIterator"
preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
<zorba:function>
@@ -61,6 +75,8 @@
</zorba:function>
</zorba:iterator>
+<!--========================================================================-->
+
<zorba:iterator name="IsStopWordLangSupportedIterator"
preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
<zorba:function>
@@ -71,6 +87,8 @@
</zorba:function>
</zorba:iterator>
+<!--========================================================================-->
+
<zorba:iterator name="IsThesaurusLangSupportedIterator"
preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
<zorba:function>
@@ -86,6 +104,8 @@
</zorba:function>
</zorba:iterator>
+<!--========================================================================-->
+
<zorba:iterator name="IsTokenizerLangSupportedIterator"
preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
<zorba:function>
@@ -96,6 +116,8 @@
</zorba:function>
</zorba:iterator>
+<!--========================================================================-->
+
<zorba:iterator name="StemIterator"
preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
<zorba:function>
@@ -111,6 +133,8 @@
</zorba:function>
</zorba:iterator>
+<!--========================================================================-->
+
<zorba:iterator name="StripDiacriticsIterator"
preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
<zorba:function>
@@ -121,6 +145,8 @@
</zorba:function>
</zorba:iterator>
+<!--========================================================================-->
+
<zorba:iterator name="ThesaurusLookupIterator"
generateResetImpl="true"
preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
@@ -167,56 +193,69 @@
</zorba:state>
</zorba:iterator>
+<!--========================================================================-->
+
<zorba:iterator name="TokenizeNodeIterator"
generateResetImpl="true"
- generateSerialize="false"
- generateConstructor="false"
- preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
-
- <zorba:state generateInit="use-default">
- <zorba:member type="store::Item_t" name="doc_item_"/>
- <zorba:member type="FTTokenIterator_t" name="doc_tokens_"/>
- </zorba:state>
-
- <zorba:member type="store::Item_t" name="token_qname_"/>
- <zorba:member type="store::Item_t" name="lang_qname_"/>
- <zorba:member type="store::Item_t" name="para_qname_"/>
- <zorba:member type="store::Item_t" name="sent_qname_"/>
- <zorba:member type="store::Item_t" name="value_qname_"/>
- <zorba:member type="store::Item_t" name="ref_qname_"/>
-
- <zorba:method name="initMembers" return="void"/>
-
-</zorba:iterator>
+ preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
+ <zorba:state generateInit="use-default">
+ <zorba:member type="store::Item_t" name="doc_item_"/>
+ <zorba:member type="FTTokenIterator_t" name="doc_tokens_"/>
+ <zorba:member type="TokenQNames" name="token_qnames_"/>
+ </zorba:state>
+</zorba:iterator>
+
+<!--========================================================================-->
+
+<zorba:iterator name="TokenizeNodesIterator"
+ generateResetImpl="true"
+ preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
+ <zorba:state generateInit="use-default">
+ <zorba:member type="store::Item_t" name="doc_item_"/>
+ <zorba:member type="FTTokenIterator_t" name="doc_tokens_"/>
+
+ <zorba:member type="TokenQNames" name="token_qnames_"/>
+
+ <zorba:member type="std::list<store::Item_t>" name="includes_"/>
+ <zorba:member type="std::vector<store::Item_t>" name="excludes_"/>
+
+ <zorba:member type="std::stack<Tokenizer*>" name="tokenizers_"/>
+ <zorba:member type="std::stack<locale::iso639_1::type>" name="langs_"/>
+ <zorba:member type="TokenizeNodesCallback" name="callback_"/>
+ <zorba:member type="Tokenizer::State" name="t_state_"/>
+ <zorba:member type="std::deque<FTToken>" name="tokens_"/>
+ </zorba:state>
+</zorba:iterator>
+
+<!--========================================================================-->
<zorba:iterator name="TokenizerPropertiesIterator"
preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
</zorba:iterator>
+<!--========================================================================-->
+
<zorba:iterator name="TokenizeStringIterator"
generateResetImpl="true"
preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
<zorba:function>
-
<zorba:signature localname="tokenize-string" prefix="full-text">
<zorba:param>xs:string</zorba:param> <!-- string -->
<zorba:output>xs:string*</zorba:output>
</zorba:signature>
-
<zorba:signature localname="tokenize-string" prefix="full-text">
<zorba:param>xs:string</zorba:param> <!-- string -->
<zorba:param>xs:language</zorba:param> <!-- lang -->
<zorba:output>xs:string*</zorba:output>
</zorba:signature>
-
</zorba:function>
-
<zorba:state generateInit="use-default">
<zorba:member type="FTTokenSeqIterator" name="string_tokens_"/>
</zorba:state>
-
</zorba:iterator>
+<!--========================================================================-->
+
</zorba:iterators>
<!-- vim:set et sw=2 ts=2: -->
=== modified file 'src/runtime/visitors/pregenerated/planiter_visitor.h'
--- src/runtime/visitors/pregenerated/planiter_visitor.h 2012-06-28 21:54:08 +0000
+++ src/runtime/visitors/pregenerated/planiter_visitor.h 2012-06-29 16:44:26 +0000
@@ -232,6 +232,9 @@
class TokenizeNodeIterator;
#endif
#ifndef ZORBA_NO_FULL_TEXT
+ class TokenizeNodesIterator;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
class TokenizerPropertiesIterator;
#endif
#ifndef ZORBA_NO_FULL_TEXT
@@ -1015,6 +1018,10 @@
virtual void endVisit ( const TokenizeNodeIterator& ) = 0;
#endif
#ifndef ZORBA_NO_FULL_TEXT
+ virtual void beginVisit ( const TokenizeNodesIterator& ) = 0;
+ virtual void endVisit ( const TokenizeNodesIterator& ) = 0;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
virtual void beginVisit ( const TokenizerPropertiesIterator& ) = 0;
virtual void endVisit ( const TokenizerPropertiesIterator& ) = 0;
#endif
=== modified file 'src/runtime/visitors/pregenerated/printer_visitor.cpp'
--- src/runtime/visitors/pregenerated/printer_visitor.cpp 2012-06-28 21:54:08 +0000
+++ src/runtime/visitors/pregenerated/printer_visitor.cpp 2012-06-29 16:44:26 +0000
@@ -1442,6 +1442,21 @@
#endif
#ifndef ZORBA_NO_FULL_TEXT
+// <TokenizeNodesIterator>
+void PrinterVisitor::beginVisit ( const TokenizeNodesIterator& a) {
+ thePrinter.startBeginVisit("TokenizeNodesIterator", ++theId);
+ printCommons( &a, theId );
+ thePrinter.endBeginVisit( theId );
+}
+
+void PrinterVisitor::endVisit ( const TokenizeNodesIterator& ) {
+ thePrinter.startEndVisit();
+ thePrinter.endEndVisit();
+}
+// </TokenizeNodesIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
// <TokenizerPropertiesIterator>
void PrinterVisitor::beginVisit ( const TokenizerPropertiesIterator& a) {
thePrinter.startBeginVisit("TokenizerPropertiesIterator", ++theId);
=== modified file 'src/runtime/visitors/pregenerated/printer_visitor.h'
--- src/runtime/visitors/pregenerated/printer_visitor.h 2012-06-28 21:54:08 +0000
+++ src/runtime/visitors/pregenerated/printer_visitor.h 2012-06-29 16:44:26 +0000
@@ -356,6 +356,11 @@
#endif
#ifndef ZORBA_NO_FULL_TEXT
+ void beginVisit( const TokenizeNodesIterator& );
+ void endVisit ( const TokenizeNodesIterator& );
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
void beginVisit( const TokenizerPropertiesIterator& );
void endVisit ( const TokenizerPropertiesIterator& );
#endif
=== modified file 'src/util/xml_util.h'
--- src/util/xml_util.h 2012-06-28 04:14:03 +0000
+++ src/util/xml_util.h 2012-06-29 16:44:26 +0000
@@ -40,12 +40,14 @@
return o << version_string_of[ v ];
}
-////////// "James Clark notation" universal name functions ////////////////////
+////////// XML name handing ///////////////////////////////////////////////////
/**
* Attempts to extract the local name from a "universal name".
* See: http://www.jclark.com/xml/xmlns.htm
*
+ * @tparam InputStringType The input string type.
+ * @tparam OutputStringType The output string type.
* @param uname The universal name.
* @param local A pointer to the string to receive the local name.
* @return Returns \c true only if the extraction was successful.
@@ -64,6 +66,8 @@
* Attempts to extract the URI from a "universal name".
* See: http://www.jclark.com/xml/xmlns.htm
*
+ * @tparam InputStringType The input string type.
+ * @tparam OutputStringType The output string type.
* @param uname The universal name.
* @param uri A pointer to the string to receive the URI.
* @return Returns \c true only if the extraction was successful.
@@ -80,11 +84,39 @@
return false;
}
+/**
+ * Splits an XML name at a \c : if present.
+ *
+ * @tparam InputStringType The input string type.
+ * @tparam PrefixStringType The output prefix string type.
+ * @tparam LocalStringType The output local string type.
+ * @param name The XML name to be split.
+ * @param prefix The prefix is put here, if any.
+ * @param local The local name is put here.
+ * @return If \a name contains a \c : and either \a prefix or \a local strings
+ * become empty, returns \c false; otherwise returns \a true.
+ */
+template<class InputStringType,class PrefixStringType,class LocalStringType>
+inline bool split_name( InputStringType const &name, PrefixStringType *prefix,
+ LocalStringType *local ) {
+ typename InputStringType::size_type const colon = name.find( ':' );
+ if ( colon != InputStringType::npos ) {
+ prefix->assign( name, 0, colon );
+ local->assign( name, colon + 1, LocalStringType::npos );
+ return !( prefix->empty() || local->empty() );
+ } else {
+ prefix->clear();
+ *local = name;
+ return true;
+ }
+}
+
////////// Character validity /////////////////////////////////////////////////
/**
* Checks whether the given code-point is valid for the given XML version.
*
+ * @tparam CodePointType The integral Unicode code-point type.
* @param v The XML version to use.
* @return Returns \c true only if the code-point is valid.
*/
@@ -196,7 +228,7 @@
/**
* Parses an XML entity reference.
*
- * @tparam StringType The type of the input string.
+ * @tparam StringType The input string type.
* @param ref The string pointing to the start of the entity reference.
* @param c A pointer to the code-point result.
* @return If successful, returns the number of characters parsed; otherwise
@@ -211,7 +243,7 @@
* Parses an XML entity reference and appends the UTF-8 encoding of the
* resulting code-point to the given string.
*
- * @tparam StringType The type of the output string.
+ * @tparam StringType The output string type.
* @param ref The C string pointing to the start of the entity reference.
* @param out A string to append to.
* @return If successful, returns the number of characters parsed; otherwise
@@ -230,8 +262,8 @@
* Parses an XML entity reference and appends the UTF-8 encoding of the
* resulting code-point to the given string.
*
- * @tparam InputStringType The type of the input string.
- * @tparam OutputStringType The type of the output string.
+ * @tparam InputStringType The input string type.
+ * @tparam OutputStringType The output string type.
* @param ref The string pointing to the start of the entity reference.
* @param out A string to append to.
* @return If successful, returns the number of characters parsed; otherwise
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenize-nodes-1.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenize-nodes-1.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenize-nodes-1.xml.res 2012-06-29 16:44:26 +0000
@@ -0,0 +1,1 @@
+true
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-nodes-1.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-nodes-1.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-nodes-1.xq 2012-06-29 16:44:26 +0000
@@ -0,0 +1,42 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+import schema namespace fts = "http://www.zorba-xquery.com/modules/full-text";;
+
+let $book :=
+ <book>
+ <title>The C++ Programming Language</title>
+ <authors>
+ <author>Bjarne Stroustrup</author>
+ </authors>
+ <chapters>
+ <chapter>
+ <title>Notes to the Reader</title>
+ <content>
+ <quote>
+ <content>
+ "The time has come," the Walrus said,
+ "to talk of many things."
+ </content>
+ <source>Lewis Carroll</source>
+ </quote>
+ <!-- more content -->
+ </content>
+ </chapter>
+ </chapters>
+ </book>
+
+let $includes := $book//chapter
+let $excludes := $book//quote
+
+let $tokens := ft:tokenize-nodes( $includes, $excludes, xs:language("en") )
+
+let $t1 := validate { $tokens[1] }
+let $t2 := validate { $tokens[2] }
+let $t3 := validate { $tokens[3] }
+let $t4 := validate { $tokens[4] }
+
+return $t1/@value = "Notes"
+ and $t2/@value = "to"
+ and $t3/@value = "the"
+ and $t4/@value = "Reader"
+
+(: vim:set et sw=2 ts=2: :)
Follow ups
-
[Merge] lp:~paul-lucas/zorba/feature-ft_bw into lp:zorba
From: noreply, 2012-06-29
-
[Merge] lp:~paul-lucas/zorba/feature-ft_bw into lp:zorba
From: Zorba Build Bot, 2012-06-29
-
[Merge] lp:~paul-lucas/zorba/feature-ft_bw into lp:zorba
From: Zorba Build Bot, 2012-06-29
-
[Merge] lp:~paul-lucas/zorba/feature-ft_bw into lp:zorba
From: Matthias Brantner, 2012-06-29
-
Re: [Merge] lp:~paul-lucas/zorba/feature-ft_bw into lp:zorba
From: Matthias Brantner, 2012-06-29
-
[Merge] lp:~paul-lucas/zorba/feature-ft_bw into lp:zorba
From: Zorba Build Bot, 2012-06-29
-
Re: [Merge] lp:~paul-lucas/zorba/feature-ft_bw into lp:zorba
From: Zorba Build Bot, 2012-06-29
-
[Merge] lp:~paul-lucas/zorba/feature-ft_bw into lp:zorba
From: Zorba Build Bot, 2012-06-29
-
[Merge] lp:~paul-lucas/zorba/feature-ft_bw into lp:zorba
From: Zorba Build Bot, 2012-06-29
-
Re: [Merge] lp:~paul-lucas/zorba/feature-ft_bw into lp:zorba
From: Matthias Brantner, 2012-06-29
-
[Merge] lp:~paul-lucas/zorba/feature-ft_bw into lp:zorba
From: Matthias Brantner, 2012-06-29
-
Re: [Merge] lp:~paul-lucas/zorba/feature-ft_bw into lp:zorba
From: Paul J. Lucas, 2012-06-29