zorba-coders team mailing list archive
-
zorba-coders team
-
Mailing list archive
-
Message #02771
[Merge] lp:~zorba-coders/zorba/caching into lp:zorba
Matthias Brantner has proposed merging lp:~zorba-coders/zorba/caching into lp:zorba.
Requested reviews:
Matthias Brantner (matthias-brantner)
Markos Zaharioudakis (markos-za)
For more details, see:
https://code.launchpad.net/~zorba-coders/zorba/caching/+merge/85423
- automatic caching of recursive, non-sequential, and deterministic functions with atomic parameter and return types
- %ann:cache and %ann:no-cache for controlling function result caching
--
https://code.launchpad.net/~zorba-coders/zorba/caching/+merge/85423
Your team Zorba Coders is subscribed to branch lp:zorba.
=== modified file 'ChangeLog'
--- ChangeLog 2011-12-10 00:10:52 +0000
+++ ChangeLog 2011-12-13 03:41:26 +0000
@@ -1,6 +1,9 @@
Zorba - The XQuery Processor
version 2.2
+
+ * Caching of results for recursive functions with atomic parameter and return types.
+ * Added %ann:cache and %ann:no-cache to enable or disable caching of results of functions with atomic parameter and return types.
* Added index management function to the C++ api's StaticCollectionManager.
version 2.1
@@ -77,6 +80,7 @@
* General index cannot be declared as unique if the type of its key is
xs:anyAtomicType or xs:untypedAtomic.
* Added undo for node revalidation
+ * Optimization for count(collection()) expressions
* Fixed bug #867133 (SWIG PHP build failure on Mac OSX)
* Fixed bug #872796 (validate-in-place can interfere with other update primitives)
* Fixed bug #872799 (validate-in-place can set incorrect types)
=== modified file 'doc/zorba/options.dox'
--- doc/zorba/options.dox 2011-09-14 06:15:19 +0000
+++ doc/zorba/options.dox 2011-12-13 03:41:26 +0000
@@ -278,6 +278,41 @@
In order to be able to use the value twice, the <tt>string:materialize</tt> function must be used to materialize the entire contents of the file <tt>myfile.txt</tt> in memory.
Otherwise, the error zerr:ZSTR0055 is raised.
+
+\paragraph caching_annotation Caching Results of Functions
+
+Caching of function results may improve performance when computationally
+expensive functions are invoked multiple times with the same arguments.
+
+If the optimization level is O1 or higher, Zorba automatically caches results
+of recursive, deterministic, and non-sequential functions whose parameter and
+return types are subtypes of xs:anyAtomicType. Specifically, if such a function
+is called more than once with the same arguments, the result of the first call
+will be cached and subsequent calls will return the cached value without
+re-evaluating the function.
+
+For example, in the following recursive function computing a fibonacci number,
+each result is automatically cached and, hence, dramatically improves performance.
+
+\include zorba/udf/udf-fib-rec.xq
+
+Specifically, this optimization reduces the complexity of the function from
+O(1.6^n) to O(n).
+
+In order to explicitly disable function caching, the user can specify the
+<tt>%ann:no-cache</tt> annotation.
+
+In addition, the user can use the <tt>%ann:cache</tt> annotation to cache the
+results of functions other than the ones that are automatically cached. However,
+this will only work if the function is not updating and its parameter and return
+types are subtypes of xs:anyAtomicType; otherwise, Zorba will raise a warning
+(zwarn:ZWST0005) and simply ignore the %ann:cache annotation.
+
+Please note, that explicitly enforcing caching for sequential or nondeterministic
+functions might not give the intended result. In such cases, Zorba will raise a
+warning (zwarn:ZWST0006) but obey the %ann:cache annotation.
+
+
\paragraph collection_index_annotations Annotations on Collections and Indexes
The \ref xqddf uses annotations to assign properties to collections and indexes.
=== modified file 'include/zorba/pregenerated/diagnostic_list.h'
--- include/zorba/pregenerated/diagnostic_list.h 2011-11-15 08:23:20 +0000
+++ include/zorba/pregenerated/diagnostic_list.h 2011-12-13 03:41:26 +0000
@@ -600,6 +600,8 @@
extern ZORBA_DLL_PUBLIC ZorbaErrorCode ZDDY0034_INDEX_RANGE_VALUE_PROBE_BAD_KEY_TYPES;
+extern ZORBA_DLL_PUBLIC ZorbaErrorCode ZDDY0035_INDEX_GENERAL_INSERT;
+
extern ZORBA_DLL_PUBLIC ZorbaErrorCode ZDDY0031_IC_NOT_DECLARED;
extern ZORBA_DLL_PUBLIC ZorbaErrorCode ZDDY0032_IC_NOT_ACTIVATED;
@@ -752,6 +754,10 @@
extern ZORBA_DLL_PUBLIC ZorbaWarningCode ZWST0004_AMBIGUOUS_SEQUENTIAL_FLWOR;
+extern ZORBA_DLL_PUBLIC ZorbaWarningCode ZWST0005_CACHING_NOT_POSSIBLE;
+
+extern ZORBA_DLL_PUBLIC ZorbaWarningCode ZWST0006_CACHING_MIGHT_NOT_BE_INTENDED;
+
} // namespace zwarn
} // namespace zorba
#endif /* ZORBA_DIAGNOSTIC_LIST_API_H */
=== modified file 'modules/com/zorba-xquery/www/modules/pregenerated/errors.xq'
--- modules/com/zorba-xquery/www/modules/pregenerated/errors.xq 2011-11-15 08:23:20 +0000
+++ modules/com/zorba-xquery/www/modules/pregenerated/errors.xq 2011-12-13 03:41:26 +0000
@@ -501,6 +501,10 @@
(:~
:)
+declare variable $zerr:ZDDY0035 as xs:QName := fn:QName($zerr:NS, "zerr:ZDDY0035");
+
+(:~
+:)
declare variable $zerr:ZDDY0031 as xs:QName := fn:QName($zerr:NS, "zerr:ZDDY0031");
(:~
=== modified file 'modules/com/zorba-xquery/www/modules/pregenerated/warnings.xq'
--- modules/com/zorba-xquery/www/modules/pregenerated/warnings.xq 2011-11-15 08:10:49 +0000
+++ modules/com/zorba-xquery/www/modules/pregenerated/warnings.xq 2011-12-13 03:41:26 +0000
@@ -53,4 +53,23 @@
(:~
:)
-declare variable $zwarn:ZWST0004 as xs:QName := fn:QName($zwarn:NS, "zwarn:ZWST0004");
\ No newline at end of file
+declare variable $zwarn:ZWST0004 as xs:QName := fn:QName($zwarn:NS, "zwarn:ZWST0004");
+
+(:~
+ :
+ : This warning is raised if the user explicitly enables caching
+ : of function results (using the %ann:cache annotation) but the function
+ : is updating or its parameter and return types are not subtypes of
+ : xs:anyAtomicType.
+ :
+:)
+declare variable $zwarn:ZWST0005 as xs:QName := fn:QName($zwarn:NS, "zwarn:ZWST0005");
+
+(:~
+ :
+ : This warning is raised if the user explicitly enables caching
+ : of function results (using the %ann:cache annotation) and the function
+ : is annotated as sequential or nondeterministic.
+ :
+:)
+declare variable $zwarn:ZWST0006 as xs:QName := fn:QName($zwarn:NS, "zwarn:ZWST0006");
\ No newline at end of file
=== modified file 'src/annotations/annotations.cpp'
--- src/annotations/annotations.cpp 2011-11-07 06:32:00 +0000
+++ src/annotations/annotations.cpp 2011-12-13 03:41:26 +0000
@@ -100,6 +100,9 @@
ZANN(streamable, streamable);
+ ZANN(cache, cache);
+ ZANN(no-cache, nocache);
+
//
// Zorba annotations - xqddf
//
@@ -168,6 +171,10 @@
ZANN(zann_nonsequential));
theRuleSet.push_back(
+ ZANN(zann_cache) |
+ ZANN(zann_nocache));
+
+ theRuleSet.push_back(
ZANN(fn_private) |
ZANN(fn_public));
@@ -433,6 +440,5 @@
}
-
} /* namespace zorba */
/* vim:set et sw=2 ts=2: */
=== modified file 'src/annotations/annotations.h'
--- src/annotations/annotations.h 2011-11-01 13:47:10 +0000
+++ src/annotations/annotations.h 2011-12-13 03:41:26 +0000
@@ -55,6 +55,8 @@
zann_nonassignable,
zann_sequential,
zann_nonsequential,
+ zann_cache,
+ zann_nocache,
zann_variadic,
zann_streamable,
zann_unique,
=== modified file 'src/compiler/codegen/plan_visitor.cpp'
--- src/compiler/codegen/plan_visitor.cpp 2011-12-07 20:46:23 +0000
+++ src/compiler/codegen/plan_visitor.cpp 2011-12-13 03:41:26 +0000
@@ -98,6 +98,7 @@
#endif
#include "functions/function.h"
+#include "functions/udf.h"
#include "functions/library.h"
#include "types/typeops.h"
@@ -2276,7 +2277,7 @@
{
CODEGEN_TRACE_OUT("");
- const function* func = v.get_func();
+ function* func = v.get_func();
std::vector<PlanIter_t> argv;
@@ -2320,45 +2321,12 @@
dynamic_cast<EnclosedIterator*>(iter.getp())->setInUpdateExpr();
}
}
-#if 0
else if (func->isUdf())
{
- const user_function* udf = static_cast<const user_function*>(func);
-
- if (udf->isExiting())
- {
- TypeManager* tm = v.get_type_manager();
-
- const xqtref_t& udfType = udf->getSignature().returnType();
-
- expr* body = udf->getBody();
-
- std::vector<expr*> exitExprs;
- ulong numExitExprs;
- ulong i;
-
- body->get_exprs_of_kind(exit_expr_kind, exitExprs);
-
- for (i = 0; i < numExitExprs; ++i)
- {
- if (!TypeOps::is_subtype(tm,
- *exitExprs[i]->get_return_type(),
- *udfType,
- loc))
- break;
- }
-
- if (i < numExitExprs)
- {
- UDFunctionCallIterator* udfIter =
- dynamic_cast<UDFunctionCallIterator*>(iter.getp());
-
- ZORBA_ASSERT(udfIter != NULL);
- udfIter->setCheckType();
- }
- }
+ // need to computeResultCaching here for iterprint to work
+ user_function* udf = static_cast<user_function*>(func);
+ udf->computeResultCaching(theCCB->theXQueryDiagnostics);
}
-#endif
}
else
{
=== modified file 'src/diagnostics/diagnostic_en.xml'
--- src/diagnostics/diagnostic_en.xml 2011-12-07 20:46:23 +0000
+++ src/diagnostics/diagnostic_en.xml 2011-12-13 03:41:26 +0000
@@ -2013,6 +2013,10 @@
<value>"$1": index range-value probe has search keys with incompatible types</value>
</diagnostic>
+ <diagnostic code="ZDDY0035" name="INDEX_GENERAL_INSERT">
+ <value>"$1": index inserting more than one key not allowed for general index</value>
+ </diagnostic>
+
<diagnostic code="ZDDY0031" name="IC_NOT_DECLARED">
<value>"$1": integrity constraint is not declared</value>
</diagnostic>
@@ -2323,6 +2327,37 @@
<value>Sequential FLWOR expr may not have the semantics you expect</value>
</diagnostic>
+ <diagnostic code="ZWST0005" name="CACHING_NOT_POSSIBLE">
+ <comment>
+ This warning is raised if the user explicitly enables caching
+ of function results (using the %ann:cache annotation) but the function
+ is updating or its parameter and return types are not subtypes of
+ xs:anyAtomicType.
+ </comment>
+ <value>"$1": function caching not possible; $2</value>
+ <entry key="RETURN_TYPE">
+ <value>return type ($3) is not subtype of xs:anyAtomicType</value>
+ </entry>
+ <entry key="PARAM_TYPE">
+ <value>type of parameter $3 is $4 which is not a subtype of xs:anyAtomicType</value>
+ </entry>
+ <entry key="UPDATING">
+ <value>function is updating</value>
+ </entry>
+ <entry key="VARIADIC">
+ <value>function is variadic</value>
+ </entry>
+ </diagnostic>
+
+ <diagnostic code="ZWST0006" name="CACHING_MIGHT_NOT_BE_INTENDED">
+ <comment>
+ This warning is raised if the user explicitly enables caching
+ of function results (using the %ann:cache annotation) and the function
+ is annotated as sequential or nondeterministic.
+ </comment>
+ <value>"$1": function caching might not give the intended result because the function is declared as $2</value>
+ </diagnostic>
+
</namespace>
<!--////////// Subvalues /////////////////////////////////////////////////-->
=== modified file 'src/diagnostics/pregenerated/diagnostic_list.cpp'
--- src/diagnostics/pregenerated/diagnostic_list.cpp 2011-11-15 08:23:20 +0000
+++ src/diagnostics/pregenerated/diagnostic_list.cpp 2011-12-13 03:41:26 +0000
@@ -879,6 +879,9 @@
ZorbaErrorCode ZDDY0034_INDEX_RANGE_VALUE_PROBE_BAD_KEY_TYPES( "ZDDY0034" );
+ZorbaErrorCode ZDDY0035_INDEX_GENERAL_INSERT( "ZDDY0035" );
+
+
ZorbaErrorCode ZDDY0031_IC_NOT_DECLARED( "ZDDY0031" );
@@ -1104,6 +1107,12 @@
ZorbaWarningCode ZWST0004_AMBIGUOUS_SEQUENTIAL_FLWOR( "ZWST0004" );
+ZorbaWarningCode ZWST0005_CACHING_NOT_POSSIBLE( "ZWST0005" );
+
+
+ZorbaWarningCode ZWST0006_CACHING_MIGHT_NOT_BE_INTENDED( "ZWST0006" );
+
+
} // namespace zwarn
} // namespace zorba
=== modified file 'src/diagnostics/pregenerated/dict_en.cpp'
--- src/diagnostics/pregenerated/dict_en.cpp 2011-12-01 16:19:52 +0000
+++ src/diagnostics/pregenerated/dict_en.cpp 2011-12-13 03:41:26 +0000
@@ -298,6 +298,7 @@
{ "ZDDY0032", "\"$1\": integrity constraint is not activated" },
{ "ZDDY0033", "\"$1\": integrity constraint not met for collection \"$2\"" },
{ "ZDDY0034", "\"$1\": index range-value probe has search keys with incompatible types" },
+ { "ZDDY0035", "\"$1\": index inserting more than one key not allowed for general index" },
{ "ZDST0001", "\"$1\": collection already declared" },
{ "ZDST0002", "\"$1\": collection already imported into module \"$2\"" },
{ "ZDST0003", "\"$1\": collection declaration not allowed in main module" },
@@ -360,6 +361,8 @@
{ "ZWST0002", "\"$1\": unknown or unsupported annotation" },
{ "ZWST0003", "\"$1\": function declared sequential, but has non-sequential body" },
{ "ZWST0004", "Sequential FLWOR expr may not have the semantics you expect" },
+ { "ZWST0005", "\"$1\": function caching not possible; $2" },
+ { "ZWST0006", "\"$1\": function caching might not give the intended result because the function is declared as $2" },
{ "ZXQD0001", "\"$1\": prefix not declared when calling function \"$2\" from $3" },
{ "ZXQD0002", "\"$1\": $2" },
{ "ZXQD0003", "inconsistent options to the parse-xml-fragment() function: $1" },
@@ -664,6 +667,10 @@
{ "~XUST0002_UDF_2", "\"$2\": function declared updating but body is not updating or vacuous" },
{ "~ZDST0060_unknown_localname", "unknown localname ($3)" },
{ "~ZDST0060_unknown_namespace", "unknown namespace ($3)" },
+ { "~ZWST0005_PARAM_TYPE", "type of parameter $3 is $4 which is not a subtype of xs:anyAtomicType" },
+ { "~ZWST0005_RETURN_TYPE", "return type ($3) is not subtype of xs:anyAtomicType" },
+ { "~ZWST0005_UPDATING", "function is updating" },
+ { "~ZWST0005_VARIADIC", "function is variadic" },
{ "~ZXQD0004_NON_NEGATIVE", "given value must be non-negative ($2)" },
{ "~ZXQD0004_NOT_WITHIN_RANGE", "not within allowed range ($2)" },
{ "~ZXQP0004_TypeOps_is_in_scope_ForFunctionItemTypes", "TypeOps::is_in_scope() for function-item types" },
=== modified file 'src/functions/udf.cpp'
--- src/functions/udf.cpp 2011-10-18 17:37:41 +0000
+++ src/functions/udf.cpp 2011-12-13 03:41:26 +0000
@@ -26,10 +26,15 @@
#include "compiler/rewriter/framework/rewriter.h"
#include "functions/udf.h"
+#include "annotations/annotations.h"
#include "functions/function_impl.h"
+#include "diagnostics/xquery_warning.h"
+
#include "types/typeops.h"
+#include "store/api/index.h" // needed for destruction of the cache
+
namespace zorba
{
@@ -54,7 +59,10 @@
theIsExiting(false),
theIsLeaf(true),
theIsOptimized(false),
- thePlanStateSize(0)
+ thePlanStateSize(0),
+ theCache(0),
+ theCacheResults(false),
+ theCacheComputed(false)
{
setFlag(FunctionConsts::isUDF);
resetFlag(FunctionConsts::isBuiltin);
@@ -318,7 +326,7 @@
/*******************************************************************************
********************************************************************************/
- PlanIter_t user_function::getPlan(CompilerCB* ccb, uint32_t& planStateSize)
+PlanIter_t user_function::getPlan(CompilerCB* ccb, uint32_t& planStateSize)
{
if (thePlan == NULL)
{
@@ -372,9 +380,185 @@
/*******************************************************************************
********************************************************************************/
-CODEGEN_DEF(user_function)
-{
- return new UDFunctionCallIterator(aSctx, aLoc, aArgs, this);
+store::Index* user_function::getCache() const
+{
+ return theCache.getp();
+}
+
+
+/*******************************************************************************
+
+********************************************************************************/
+void user_function::setCache(store::Index* aCache)
+{
+ theCache = aCache;
+}
+
+
+/*******************************************************************************
+
+********************************************************************************/
+bool user_function::cacheResults() const
+{
+ return theCacheResults;
+}
+
+
+/*******************************************************************************
+ only cache recursive (non-sequential, non-updating, deterministic)
+ functions with singleton atomic input and output
+********************************************************************************/
+void user_function::computeResultCaching(XQueryDiagnostics* diag)
+{
+ if (theCacheComputed)
+ {
+ return;
+ }
+
+ struct OnExit
+ {
+ private:
+ bool& theResult;
+ bool& theCacheComputed;
+
+ public:
+ OnExit(bool& aResult, bool& aCacheComputed)
+ :
+ theResult(aResult),
+ theCacheComputed(aCacheComputed) {}
+
+ void cache() { theResult = true; }
+
+ ~OnExit()
+ {
+ theCacheComputed = true;
+ }
+ };
+
+ // will be destroyed when the function is exited
+ // set caching to true if cache() is called
+ OnExit lExit(theCacheResults, theCacheComputed);
+
+ // check necessary conditions
+ // %ann:cache or not %ann:no-cache
+ if (theAnnotationList &&
+ theAnnotationList->contains(AnnotationInternal::zann_nocache))
+ {
+ return;
+ }
+
+ // was the %ann:cache annotation given explicitly by the user
+ bool lExplicitCacheRequest =
+ (theAnnotationList ?
+ theAnnotationList->contains(AnnotationInternal::zann_cache) :
+ false);
+
+ if (isVariadic())
+ {
+ if (lExplicitCacheRequest)
+ {
+ diag->add_warning(
+ NEW_XQUERY_WARNING(zwarn::ZWST0005_CACHING_NOT_POSSIBLE,
+ WARN_PARAMS(getName()->getStringValue(), ZED(ZWST0005_VARIADIC)),
+ WARN_LOC(theLoc)));
+ }
+ return;
+ }
+
+ // parameter and return types are subtype of xs:anyAtomicType?
+ const xqtref_t& lRes = theSignature.returnType();
+ TypeManager* tm = lRes->get_manager();
+
+ if (!TypeOps::is_subtype(tm,
+ *lRes,
+ *GENV_TYPESYSTEM.ANY_ATOMIC_TYPE_ONE,
+ theLoc))
+ {
+ if (lExplicitCacheRequest)
+ {
+ diag->add_warning(
+ NEW_XQUERY_WARNING(zwarn::ZWST0005_CACHING_NOT_POSSIBLE,
+ WARN_PARAMS(getName()->getStringValue(),
+ ZED(ZWST0005_RETURN_TYPE),
+ lRes->toString()),
+ WARN_LOC(theLoc)));
+ }
+ return;
+ }
+
+ csize lArity = theSignature.paramCount();
+ for (csize i = 0; i < lArity; ++i)
+ {
+ const xqtref_t& lArg = theSignature[i];
+ if (!TypeOps::is_subtype(tm,
+ *lArg,
+ *GENV_TYPESYSTEM.ANY_ATOMIC_TYPE_ONE,
+ theLoc))
+ {
+ if (lExplicitCacheRequest)
+ {
+ diag->add_warning(
+ NEW_XQUERY_WARNING(zwarn::ZWST0005_CACHING_NOT_POSSIBLE,
+ WARN_PARAMS(getName()->getStringValue(),
+ ZED(ZWST0005_PARAM_TYPE),
+ i+1,
+ lArg->toString()),
+ WARN_LOC(theLoc)));
+ }
+ return;
+ }
+ }
+
+ // function updating?
+ if (isUpdating())
+ {
+ if (lExplicitCacheRequest)
+ {
+ diag->add_warning(
+ NEW_XQUERY_WARNING(zwarn::ZWST0005_CACHING_NOT_POSSIBLE,
+ WARN_PARAMS(getName()->getStringValue(), ZED(ZWST0005_UPDATING)),
+ WARN_LOC(theLoc)));
+ }
+ return;
+ }
+
+ if (isSequential() || !isDeterministic())
+ {
+ if (lExplicitCacheRequest)
+ {
+ diag->add_warning(
+ NEW_XQUERY_WARNING(zwarn::ZWST0006_CACHING_MIGHT_NOT_BE_INTENDED,
+ WARN_PARAMS(getName()->getStringValue(),
+ (isSequential()?"sequential":"non-deterministic")),
+ WARN_LOC(theLoc)));
+
+ lExit.cache();
+ }
+ return;
+ }
+
+
+ // optimization is prerequisite before invoking isRecursive
+ if (!lExplicitCacheRequest && isOptimized() && !isRecursive())
+ {
+ return;
+ }
+
+ lExit.cache();
+}
+
+
+/*******************************************************************************
+
+********************************************************************************/
+PlanIter_t user_function::codegen(
+ CompilerCB* cb,
+ static_context* sctx,
+ const QueryLoc& loc,
+ std::vector<PlanIter_t>& argv,
+ AnnotationHolder& ann) const
+{
+ return new UDFunctionCallIterator(sctx, loc, argv, this);
}
=== modified file 'src/functions/udf.h'
--- src/functions/udf.h 2011-10-18 17:37:41 +0000
+++ src/functions/udf.h 2011-12-13 03:41:26 +0000
@@ -25,6 +25,12 @@
namespace zorba
{
+ namespace store
+ {
+ class Index;
+ typedef rchandle<Index> Index_t;
+ }
+
/*******************************************************************************
A udf with params $x1, $x2, ..., $xn and a body_expr is translated into a
@@ -78,6 +84,25 @@
references of an arg var, these references are "mutually exclusive", ie,
at most one of the references will actually be reached during each particular
execution of the body.
+
+ theCache:
+ ---------
+ Maps the arg values of an invocation to the result of that invocation.
+ If an invocation uses the same arg values as a previous invocation, the cached
+ result is simply returned without re-evaluating the udf.
+
+ theCacheResults:
+ ----------------
+ Tells whether caching should be done for this udf or not.
+
+ theCacheComputed:
+ -----------------
+ Tells whether theCacheResults has been computed already or not.
+ theCacheResults is computed by the computeResultCaching() method, which is
+ invoked during codegen every time a udf call is encountered. The same udf may
+ be invoked multiple times, but the computation of theCacheResults needs to
+ be done only once. So, during the 1st invcocation of computeResultCaching(),
+ theCacheComputed is set to true, and subsequent invocations are noops.
********************************************************************************/
class user_function : public function
{
@@ -102,6 +127,10 @@
uint32_t thePlanStateSize;
std::vector<ArgVarRefs> theArgVarsRefs;
+ store::Index_t theCache;
+ bool theCacheResults;
+ bool theCacheComputed;
+
public:
SERIALIZABLE_CLASS(user_function)
user_function(::zorba::serialization::Archiver& ar);
@@ -160,6 +189,14 @@
const std::vector<ArgVarRefs>& getArgVarsRefs() const;
+ store::Index* getCache() const;
+
+ void setCache(store::Index* aCache);
+
+ bool cacheResults() const;
+
+ void computeResultCaching(XQueryDiagnostics*);
+
PlanIter_t codegen(
CompilerCB* cb,
static_context* sctx,
=== modified file 'src/runtime/core/fncall_iterator.cpp'
--- src/runtime/core/fncall_iterator.cpp 2011-10-30 00:18:34 +0000
+++ src/runtime/core/fncall_iterator.cpp 2011-12-13 03:41:26 +0000
@@ -48,6 +48,11 @@
#include "util/string_util.h"
+#include "store/api/index.h"
+#include "store/api/store.h"
+#include "store/api/iterator_factory.h"
+#include "store/api/temp_seq.h"
+
#ifdef ZORBA_WITH_DEBUGGER
#include "debugger/debugger_commons.h"
@@ -93,7 +98,8 @@
thePlanState(NULL),
thePlanStateSize(0),
theLocalDCtx(NULL),
- thePlanOpen(false)
+ thePlanOpen(false),
+ theCache(0)
{
}
@@ -153,6 +159,8 @@
{
thePlan->reset(*thePlanState);
}
+
+ theArgValues.clear();
}
@@ -198,6 +206,113 @@
/*******************************************************************************
********************************************************************************/
+bool UDFunctionCallIterator::isCached() const
+{
+ return theUDF->cacheResults();
+}
+
+
+/*******************************************************************************
+
+********************************************************************************/
+void UDFunctionCallIterator::createCache(
+ PlanState& planState,
+ UDFunctionCallIteratorState* state)
+{
+ store::Index_t lIndex = theUDF->getCache();
+
+ if (!lIndex && theUDF->cacheResults())
+ {
+ const signature& sig = theUDF->getSignature();
+
+ csize numArgs = theChildren.size();
+
+ store::IndexSpecification lSpec;
+ lSpec.theNumKeyColumns = numArgs;
+ lSpec.theKeyTypes.resize(numArgs);
+ lSpec.theCollations.resize(numArgs);
+ lSpec.theIsTemp = true;
+ lSpec.theIsUnique = true;
+
+ for (csize i = 0; i < numArgs; ++i)
+ {
+ lSpec.theKeyTypes[i] = sig[i]->getBaseBuiltinType()->get_qname().getp();
+ }
+
+ lIndex = GENV_STORE.createIndex(theUDF->getName(), lSpec, 0);
+
+ theUDF->setCache(lIndex.getp()); // cache the cache in the function itself
+
+ state->theArgValues.reserve(numArgs);
+ }
+
+ state->theCache = lIndex.getp();
+}
+
+
+/*******************************************************************************
+
+********************************************************************************/
+bool UDFunctionCallIterator::probeCache(
+ PlanState& planState,
+ UDFunctionCallIteratorState* state,
+ store::Item_t& result,
+ std::vector<store::Item_t>& aKey) const
+{
+ if (!state->theCache)
+ return false;
+
+ store::IndexCondition_t lCond =
+ state->theCache->createCondition(store::IndexCondition::POINT_VALUE);
+
+ std::vector<store::Iterator_t>::iterator lIter = state->theArgWrappers.begin();
+
+ for (; lIter != state->theArgWrappers.end(); ++lIter)
+ {
+ store::Iterator_t& argWrapper = (*lIter);
+ store::Item_t lArg;
+ if (argWrapper) // might be 0 if argument is not used
+ {
+ argWrapper->next(lArg); // guaranteed to have exactly one result
+ }
+ aKey.push_back(lArg);
+ lCond->pushItem(lArg);
+ }
+
+ store::IndexProbeIterator_t probeIte =
+ GENV_STORE.getIteratorFactory()->createIndexProbeIterator(state->theCache);
+
+ probeIte->init(lCond);
+ probeIte->open();
+ return probeIte->next(result);
+}
+
+
+/*******************************************************************************
+
+********************************************************************************/
+void UDFunctionCallIterator::insertCacheEntry(
+ UDFunctionCallIteratorState* state,
+ std::vector<store::Item_t>& aKey,
+ store::Item_t& aValue) const
+{
+ if (state->theCache)
+ {
+ std::auto_ptr<store::IndexKey> k(new store::IndexKey());
+ store::IndexKey* k2 = k.get();
+ k->theItems = aKey;
+ store::Item_t lTmp = aValue; // insert will eventually transfer the Item_t
+ if (!state->theCache->insert(k2, lTmp))
+ {
+ k.release();
+ }
+ }
+}
+
+
+/*******************************************************************************
+
+********************************************************************************/
void UDFunctionCallIterator::openImpl(PlanState& planState, uint32_t& offset)
{
UDFunctionCallIteratorState* state;
@@ -229,6 +344,11 @@
// the plan state (but not the state block) and dynamic context.
state->open(planState, theUDF);
+ // if the results of the function should be cached (prereq: atomic in and out)
+ // this functions stores an index in the dynamic context that contains
+ // the cached results. The name of the index is the name of the function.
+ createCache(planState, state);
+
// Create a wrapper over each subplan that computes an argument expr, if the
// associated param is actually used anywhere in the function body.
csize numArgs = theChildren.size();
@@ -301,6 +421,9 @@
{
try
{
+ std::vector<store::Item_t> lKey;
+ bool lCacheHit;
+
UDFunctionCallIteratorState* state;
DEFAULT_STACK_INIT(UDFunctionCallIteratorState, state, planState);
@@ -314,19 +437,31 @@
state->thePlanOpen = true;
}
- // Bind the args.
+ // check if there is a cache and the result is already in the cache
+ lCacheHit = probeCache(planState, state, result, lKey);
+
+ // if not in the cache, we bind the arguments to the function
+ if (!lCacheHit)
{
const std::vector<ArgVarRefs>& argsRefs = theUDF->getArgVarsRefs();
- std::vector<ArgVarRefs>::const_iterator argsRefsIte = argsRefs.begin();
- std::vector<ArgVarRefs>::const_iterator argsRefsEnd = argsRefs.end();
-
- std::vector<store::Iterator_t>::iterator argWrapsIte =
- state->theArgWrappers.begin();
-
- for (; argsRefsIte != argsRefsEnd; ++argsRefsIte, ++argWrapsIte)
+ const std::vector<store::Iterator_t>& argWraps = state->theArgWrappers;
+
+ for (size_t i = 0; i < argsRefs.size(); ++i)
{
- store::Iterator_t& argWrapper = (*argWrapsIte);
- const ArgVarRefs& argVarRefs = (*argsRefsIte);
+ const ArgVarRefs& argVarRefs = argsRefs[i];
+ store::Iterator_t argWrapper;
+ if (state->theCache)
+ {
+ std::vector<store::Item_t> lParam(1, lKey[i]);
+ state->theArgValues.push_back(GENV_STORE.createTempSeq(lParam));
+ argWrapper = state->theArgValues.back()->getIterator();
+ argWrapper->open();
+ }
+ else
+ {
+ argWrapper = argWraps[i];
+ }
+
ArgVarRefs::const_iterator argVarRefsIte = argVarRefs.begin();
ArgVarRefs::const_iterator argVarRefsEnd = argVarRefs.end();
@@ -343,26 +478,35 @@
}
}
-#ifdef ZORBA_WITH_DEBUGGER
- DEBUGGER_PUSH_FRAME;
-#endif
-
- while (consumeNext(result, state->thePlan, *state->thePlanState))
+ if (lCacheHit)
{
-#ifdef ZORBA_WITH_DEBUGGER
- DEBUGGER_POP_FRAME;
-#endif
STACK_PUSH(true, state);
-
+ }
+ else
+ {
#ifdef ZORBA_WITH_DEBUGGER
DEBUGGER_PUSH_FRAME;
#endif
+ while (consumeNext(result, state->thePlan, *state->thePlanState))
+ {
+#ifdef ZORBA_WITH_DEBUGGER
+ DEBUGGER_POP_FRAME;
+#endif
+
+ insertCacheEntry(state, lKey, result);
+ STACK_PUSH(true, state);
+
+#ifdef ZORBA_WITH_DEBUGGER
+ DEBUGGER_PUSH_FRAME;
+#endif
+ }
+
+#ifdef ZORBA_WITH_DEBUGGER
+ DEBUGGER_POP_FRAME;
+#endif
+
}
-#ifdef ZORBA_WITH_DEBUGGER
- DEBUGGER_POP_FRAME;
-#endif
-
STACK_END(state);
}
=== modified file 'src/runtime/core/fncall_iterator.h'
--- src/runtime/core/fncall_iterator.h 2011-10-30 00:18:34 +0000
+++ src/runtime/core/fncall_iterator.h 2011-12-13 03:41:26 +0000
@@ -69,16 +69,31 @@
the body. So, it is never the case that the arg expr will have more than one
consumers, and as a result we can bind all those V references to the same arg
wrapper.
+
+ theCache:
+ ---------
+ Is an Index which is set in the state if caching for the invoked function
+ should be done. The cache is owned by the UDF itself and shared across
+ all function invocations.
+
+ theCacheHits:
+ -------------
+ If caching is used, this vector contains the results of all arguments
+ of the function evaluation. It's used to bind the variables if the
+ cache didn't give a result in order to avoid duplicate evaluation of
+ the arguments.
********************************************************************************/
class UDFunctionCallIteratorState : public PlanIteratorState
{
public:
- PlanIterator * thePlan;
- PlanState * thePlanState;
- uint32_t thePlanStateSize;
- dynamic_context * theLocalDCtx;
- bool thePlanOpen;
- std::vector<store::Iterator_t> theArgWrappers;
+ PlanIterator * thePlan;
+ PlanState * thePlanState;
+ uint32_t thePlanStateSize;
+ dynamic_context * theLocalDCtx;
+ bool thePlanOpen;
+ std::vector<store::Iterator_t> theArgWrappers;
+ store::Index * theCache;
+ std::vector<store::TempSeq_t> theArgValues;
UDFunctionCallIteratorState();
@@ -130,6 +145,8 @@
void setDynamic() { theIsDynamic = true; }
+ bool isCached() const;
+
void accept(PlanIterVisitor& v) const;
void openImpl(PlanState& planState, uint32_t& offset);
@@ -139,6 +156,22 @@
void closeImpl(PlanState& planState);
bool nextImpl(store::Item_t& result, PlanState& planState) const;
+
+protected:
+ void createCache(
+ PlanState& planState,
+ UDFunctionCallIteratorState* state);
+
+ bool probeCache(
+ PlanState& planState,
+ UDFunctionCallIteratorState* state,
+ store::Item_t& result,
+ std::vector<store::Item_t>& aKey) const;
+
+ void insertCacheEntry(
+ UDFunctionCallIteratorState* state,
+ std::vector<store::Item_t>& aKey,
+ store::Item_t& aValue) const;
};
=== modified file 'src/runtime/visitors/printer_visitor_impl.cpp'
--- src/runtime/visitors/printer_visitor_impl.cpp 2011-12-01 11:02:25 +0000
+++ src/runtime/visitors/printer_visitor_impl.cpp 2011-12-13 03:41:26 +0000
@@ -1209,7 +1209,23 @@
#undef TYPED_VAL_CMP
- PRINTER_VISITOR_DEFINITION (UDFunctionCallIterator)
+ void PrinterVisitor::beginVisit ( const UDFunctionCallIterator& a )
+ {
+ thePrinter.startBeginVisit("UDFunctionCallIterator", ++theId);
+ printCommons( &a, theId );
+ if (a.isCached())
+ {
+ thePrinter.addAttribute("cached", "true");
+ }
+ thePrinter.endBeginVisit( theId);
+ }
+
+ void PrinterVisitor::endVisit ( const UDFunctionCallIterator& )
+ {
+ thePrinter.startEndVisit();
+ thePrinter.endEndVisit();
+ }
+
PRINTER_VISITOR_DEFINITION (ExtFunctionCallIterator)
PRINTER_VISITOR_DEFINITION (FnBooleanIterator)
PRINTER_VISITOR_DEFINITION (OrIterator)
=== modified file 'src/store/api/index.h'
--- src/store/api/index.h 2011-10-11 12:11:48 +0000
+++ src/store/api/index.h 2011-12-13 03:41:26 +0000
@@ -415,6 +415,18 @@
* Returns all keys stored in this index
*/
virtual KeyIterator_t keys() const = 0;
+
+ /**
+ * Insert the given item in the value set of the given key. If the key is not
+ * in the index already, then the key itself is inserted as well. Return true
+ * if the key was already in the index, false otherwise
+ * The index wil take the ownership of the key if it was not already in the
+ * index.
+ *
+ * @error ZDDY0035 if a key with more than one item is inserted into
+ * a general index
+ */
+ virtual bool insert(store::IndexKey*& key, store::Item_t& item) = 0;
};
=== modified file 'src/store/naive/simple_index_general.cpp'
--- src/store/naive/simple_index_general.cpp 2011-11-15 18:48:54 +0000
+++ src/store/naive/simple_index_general.cpp 2011-12-13 03:41:26 +0000
@@ -236,6 +236,20 @@
/******************************************************************************
*******************************************************************************/
+bool GeneralIndex::insert(store::IndexKey*& key, store::Item_t& value)
+{
+ if (key->size() != 1)
+ {
+ RAISE_ERROR_NO_LOC(zerr::ZDDY0035_INDEX_GENERAL_INSERT,
+ ERROR_PARAMS(getName()->getStringValue()));
+ }
+ return insert((*key)[0], value);
+}
+
+
+/******************************************************************************
+
+*******************************************************************************/
bool GeneralIndex::insert(store::Item_t& key, store::Item_t& node)
{
bool lossy = false;
=== modified file 'src/store/naive/simple_index_general.h'
--- src/store/naive/simple_index_general.h 2011-11-15 18:48:54 +0000
+++ src/store/naive/simple_index_general.h 2011-12-13 03:41:26 +0000
@@ -174,6 +174,8 @@
bool insert(store::Item_t& key, store::Item_t& node);
+ bool insert(store::IndexKey*& key, store::Item_t& value);
+
virtual bool remove(const store::Item_t& key, store::Item_t& item, bool all) = 0;
};
=== modified file 'src/store/naive/simple_store.cpp'
--- src/store/naive/simple_store.cpp 2011-11-02 17:19:09 +0000
+++ src/store/naive/simple_store.cpp 2011-12-13 03:41:26 +0000
@@ -555,6 +555,9 @@
store::Iterator* aSourceIter,
ulong aNumColumns)
{
+ if (!aSourceIter)
+ return;
+
store::Item_t domainItem;
store::IndexKey* key = NULL;
=== modified file 'src/store/naive/simple_temp_seq.cpp'
--- src/store/naive/simple_temp_seq.cpp 2011-08-19 17:04:48 +0000
+++ src/store/naive/simple_temp_seq.cpp 2011-12-13 03:41:26 +0000
@@ -53,27 +53,6 @@
/*******************************************************************************
********************************************************************************/
-store::Item_t SimpleTempSeq::operator[](xs_integer aIndex)
-{
- uint64_t lIndex;
- try {
- lIndex = to_xs_unsignedLong(aIndex);
- } catch (std::range_error& e)
- {
- throw ZORBA_EXCEPTION(
- zerr::ZSTR0060_RANGE_EXCEPTION,
- ERROR_PARAMS(
- BUILD_STRING("access out of bounds " << e.what() << ")")
- )
- );
- }
- return theItems[lIndex];
-}
-
-
-/*******************************************************************************
-
-********************************************************************************/
xs_integer SimpleTempSeq::getSize()
{
return theItems.size();
@@ -312,14 +291,15 @@
case none:
if ( theCurPos < to_xs_unsignedLong(theTempSeq->getSize()) )
{
- result = (*theTempSeq)[theCurPos];
+ result = theTempSeq->theItems[theCurPos];
return true;
}
break;
+
case startEnd:
if ( theCurPos < theEndPos )
{
- result = (*theTempSeq)[theCurPos];
+ result = theTempSeq->theItems[theCurPos];
return true;
}
break;
@@ -336,17 +316,17 @@
{
case none:
if ( theCurPos < to_xs_unsignedLong(theTempSeq->getSize()) )
- return (*theTempSeq)[theCurPos];
+ return theTempSeq->theItems[theCurPos];
break;
case startEnd:
if ( theCurPos < theEndPos )
- return (*theTempSeq)[theCurPos];
+ return theTempSeq->theItems[theCurPos];
break;
}
return NULL;
}
-
+
void SimpleTempSeqIter::reset()
{
=== modified file 'src/store/naive/simple_temp_seq.h'
--- src/store/naive/simple_temp_seq.h 2011-07-29 23:01:30 +0000
+++ src/store/naive/simple_temp_seq.h 2011-12-13 03:41:26 +0000
@@ -35,6 +35,8 @@
********************************************************************************/
class SimpleTempSeq : public store::TempSeq
{
+ friend class SimpleTempSeqIter;
+
private:
std::vector<store::Item_t> theItems;
@@ -55,8 +57,6 @@
void append(store::Iterator_t iter, bool copy);
- store::Item_t operator[](xs_integer aIndex);
-
void getItem(xs_integer position, store::Item_t& res);
bool containsItem(xs_integer position);
@@ -102,9 +102,9 @@
SimpleTempSeq_t theTempSeq;
BorderType theBorderType;
- uint64_t theCurPos;
- uint64_t theStartPos;
- uint64_t theEndPos;
+ uint64_t theCurPos;
+ uint64_t theStartPos;
+ uint64_t theEndPos;
public:
SimpleTempSeqIter() {}
=== added file 'test/rbkt/ExpCompilerResults/IterPlan/zorba/udf/udf-fib-rec.iter'
--- test/rbkt/ExpCompilerResults/IterPlan/zorba/udf/udf-fib-rec.iter 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpCompilerResults/IterPlan/zorba/udf/udf-fib-rec.iter 2011-12-13 03:41:26 +0000
@@ -0,0 +1,42 @@
+Iterator tree for main query:
+<UDFunctionCallIterator cached="true">
+ <SingletonIterator value="xs:integer(100)"/>
+</UDFunctionCallIterator>
+
+Iterator tree for local:fib:
+<flwor::FLWORIterator>
+ <ForVariable name="n">
+ <LetVarIterator varname="n"/>
+ </ForVariable>
+ <ReturnClause>
+ <IfThenElseIterator>
+ <TypedValueCompareIterator_INTEGER>
+ <ForVarIterator varname="n"/>
+ <SingletonIterator value="xs:integer(0)"/>
+ </TypedValueCompareIterator_INTEGER>
+ <SingletonIterator value="xs:integer(0)"/>
+ <IfThenElseIterator>
+ <TypedValueCompareIterator_INTEGER>
+ <ForVarIterator varname="n"/>
+ <SingletonIterator value="xs:integer(1)"/>
+ </TypedValueCompareIterator_INTEGER>
+ <SingletonIterator value="xs:integer(1)"/>
+ <SpecificNumArithIterator_AddOperation_INTEGER>
+ <UDFunctionCallIterator cached="true">
+ <SpecificNumArithIterator_SubtractOperation_INTEGER>
+ <ForVarIterator varname="n"/>
+ <SingletonIterator value="xs:integer(1)"/>
+ </SpecificNumArithIterator_SubtractOperation_INTEGER>
+ </UDFunctionCallIterator>
+ <UDFunctionCallIterator cached="true">
+ <SpecificNumArithIterator_SubtractOperation_INTEGER>
+ <ForVarIterator varname="n"/>
+ <SingletonIterator value="xs:integer(2)"/>
+ </SpecificNumArithIterator_SubtractOperation_INTEGER>
+ </UDFunctionCallIterator>
+ </SpecificNumArithIterator_AddOperation_INTEGER>
+ </IfThenElseIterator>
+ </IfThenElseIterator>
+ </ReturnClause>
+</flwor::FLWORIterator>
+
=== added file 'test/rbkt/ExpQueryResults/zorba/udf/udf-fib-rec.xml.res'
--- test/rbkt/ExpQueryResults/zorba/udf/udf-fib-rec.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/udf/udf-fib-rec.xml.res 2011-12-13 03:41:26 +0000
@@ -0,0 +1,1 @@
+3736710778780434371
=== added file 'test/rbkt/Queries/zorba/udf/udf-fib-rec.xq'
--- test/rbkt/Queries/zorba/udf/udf-fib-rec.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/udf/udf-fib-rec.xq 2011-12-13 03:41:26 +0000
@@ -0,0 +1,8 @@
+declare function local:fib($n as xs:integer) as xs:integer
+{
+ if ($n eq 0) then 0
+ else if ($n eq 1) then 1
+ else local:fib($n - 1) + local:fib($n - 2)
+};
+
+local:fib(100)
Follow ups
-
[Merge] lp:~zorba-coders/zorba/caching into lp:zorba
From: noreply, 2011-12-13
-
[Merge] lp:~zorba-coders/zorba/caching into lp:zorba
From: Zorba Build Bot, 2011-12-13
-
[Merge] lp:~zorba-coders/zorba/caching into lp:zorba
From: Zorba Build Bot, 2011-12-13
-
[Merge] lp:~zorba-coders/zorba/caching into lp:zorba
From: Markos Zaharioudakis, 2011-12-13
-
Re: [Merge] lp:~zorba-coders/zorba/caching into lp:zorba
From: Markos Zaharioudakis, 2011-12-13
-
[Merge] lp:~zorba-coders/zorba/caching into lp:zorba
From: Zorba Build Bot, 2011-12-13
-
Re: [Merge] lp:~zorba-coders/zorba/caching into lp:zorba
From: Zorba Build Bot, 2011-12-13
-
[Merge] lp:~zorba-coders/zorba/caching into lp:zorba
From: Zorba Build Bot, 2011-12-13
-
[Merge] lp:~zorba-coders/zorba/caching into lp:zorba
From: Zorba Build Bot, 2011-12-13
-
[Merge] lp:~zorba-coders/zorba/caching into lp:zorba
From: Matthias Brantner, 2011-12-13
-
Re: [Merge] lp:~zorba-coders/zorba/caching into lp:zorba
From: Matthias Brantner, 2011-12-13
-
[Merge] lp:~zorba-coders/zorba/caching into lp:zorba
From: Matthias Brantner, 2011-12-13