launchpad-reviewers team mailing list archive
-
launchpad-reviewers team
-
Mailing list archive
-
Message #04450
[Merge] lp:~adeuring/launchpad/bug-739052-2 into lp:launchpad
Abel Deuring has proposed merging lp:~adeuring/launchpad/bug-739052-2 into lp:launchpad with lp:~adeuring/launchpad/bug-739052 as a prerequisite.
Requested reviews:
Launchpad code reviewers (launchpad-reviewers)
For more details, see:
https://code.launchpad.net/~adeuring/launchpad/bug-739052-2/+merge/70044
(see https://code.launchpad.net/~adeuring/launchpad/bug-739052/+merge/69625 for a description of StormRangeFactory)
This branch fixes a serious issue discovered by lifeless when he reviewed the prerequisite branch of this one: lp:~adeuring/launchpad/bug-739052 :
The WHERE query used in StormRangeFactory.getSlice() was plain wrong when the Storm result set is ordered by two or more columns:
- If we have a query like SELECT ... FROM ... ORDER BY table.col1, table.col2
- If the result set contains rows with values for col1, col2 like
1 1
1 2
1 3
2 1
2 2
2 3
- If the memo value is (1, 2), the old variant would use a slicing related WHERE clause
"col1 > 1 AND col2 > 2", which would return only the rows (1, 3) and (2, 3)
The fix is obvious: If a result set is ordered by N columns, we must
- first look for rows where the first N-1 sort columns have the value given by memo, and where
column N has values greater than the memo value;
- then look for rows where the first N-2 sort columns have the value given by memo, and where
column N-1 has values greater than the memo value;
- [repeat for N-2..1]
I chose quite arbitrarily to simply OR combine these WHERE expressions; as noted in a comment, it might sometimes be better to use sub-SELECTs for each expression and to combine these sub-selects by UNION ALL.
Up to now, the class StormRangeFactory has not been tested with production data -- I am really not sure if we can predict when which query variant will be faster, and I doubt that we can be sure that one variant will be faster than the other for all queries. So I think we might need to sub-class StormRangeFactory() if the UNION ALL variant turns out to be more useful in certain cases. It should be enough to override the method getSlicedFromMemo().
I also renamed getSortExpressions() with getorderBy(), as suggested by lifeless.
test:
./bin/test canonical -vvt canonical.launchpad.webapp.tests.test_batching.TestStormRangeFactory
no lint
--
https://code.launchpad.net/~adeuring/launchpad/bug-739052-2/+merge/70044
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~adeuring/launchpad/bug-739052-2 into lp:launchpad.
=== modified file 'lib/canonical/launchpad/webapp/batching.py'
--- lib/canonical/launchpad/webapp/batching.py 2011-08-01 16:17:14 +0000
+++ lib/canonical/launchpad/webapp/batching.py 2011-08-01 16:17:17 +0000
@@ -1,4 +1,4 @@
-# Copyright 2009 Canonical Ltd. This software is licensed under the
+# Copyright 2011 Canonical Ltd. This software is licensed under the
# GNU Affero General Public License version 3 (see the file LICENSE).
__metaclass__ = type
@@ -9,7 +9,11 @@
from lazr.batchnavigator.interfaces import IRangeFactory
import simplejson
from storm import Undef
-from storm.expr import Desc
+from storm.expr import (
+ And,
+ Desc,
+ Or,
+ )
from storm.properties import PropertyColumn
from storm.zope.interfaces import IResultSet
from zope.component import adapts
@@ -175,6 +179,10 @@
store.find(Bug, Bug.id < 10)
works.
+
+ Note: This factory assumes that the result set is fully sorted,
+ i.e. that the set of the column values used for sorting is
+ distinct for each result row.
"""
implements(IRangeFactory)
@@ -194,7 +202,7 @@
self.plain_resultset = resultset
self.error_cb = error_cb
- def getSortExpressions(self):
+ def getOrderBy(self):
"""Return the order_by expressions of the result set."""
return removeSecurityProxy(self.plain_resultset)._order_by
@@ -204,7 +212,7 @@
sort_values = []
if not zope_isinstance(row, tuple):
row = (row, )
- sort_expressions = self.getSortExpressions()
+ sort_expressions = self.getOrderBy()
if sort_expressions is Undef:
raise StormRangeFactoryError(
'StormRangeFactory requires a sorted result set.')
@@ -213,7 +221,7 @@
expression = expression.expr
if not zope_isinstance(expression, PropertyColumn):
raise StormRangeFactoryError(
- 'StormRangeFactory supports only sorting by '
+ 'StormRangeFactory only supports sorting by '
'PropertyColumn, not by %r.' % expression)
class_instance_found = False
for row_part in row:
@@ -267,7 +275,7 @@
'memo must be the JSON representation of a list.')
return None
- sort_expressions = self.getSortExpressions()
+ sort_expressions = self.getOrderBy()
if len(sort_expressions) != len(parsed_memo):
self.reportError(
'Invalid number of elements in memo string. '
@@ -323,17 +331,74 @@
return [
invert_sort_expression(expression)
- for expression in self.getSortExpressions()]
-
- def whereExpressionFromSortExpression(self, expression, memo):
- """Create a Storm expression to be used in the WHERE clause of the
- slice query.
- """
- if isinstance(expression, Desc):
- expression = expression.expr
- return expression < memo
- else:
- return expression > memo
+ for expression in self.getOrderBy()]
+
+ def andClausesForLeadingColumns(self, limits):
+ def plain_expression(expression):
+ # Strip a possible Desc from an expression.
+ if isinstance(expression, Desc):
+ return expression.expr
+ else:
+ return expression
+ return [plain_expression(column) == memo for column, memo in limits]
+
+ def whereExpressions(self, limits):
+ """Generate WHERE expressions for the given sort columns and
+ memos values.
+
+ :return: A list of where expressions which should be OR-ed
+ :param limits: A sequence of (column, mem_value) tuples.
+
+ Note that the result set may be ordered by more than one column.
+ Given the sort columns c[1], c[2] ... c[N] and assuming ascending
+ sorting, we must look for all rows where
+ * c[1] == memo[1], c[2] == memo[2] ... c[N-1] == memo[N-1] and
+ c[N] > memo[N]
+ * c[1] == memo[1], c[2] == memo[2] ... c[N-2] == memo[N-2] and
+ c[N-1] > memo[N-1]
+ ...
+ * c[1] == memo[1], c[2] > memo[2]
+ * c[1] > memo[1]
+ """
+ start = limits[:-1]
+ last_expression, last_memo = limits[-1]
+ if isinstance(last_expression, Desc):
+ last_expression = last_expression.expr
+ last_limit = last_expression < last_memo
+ else:
+ last_limit = last_expression > last_memo
+ if len(start) > 0:
+ clauses = self.andClausesForLeadingColumns(start)
+ clauses.append(last_limit)
+ clause_for_last_column = reduce(And, clauses)
+ return [clause_for_last_column] + self.whereExpressions(start)
+ else:
+ return [last_limit]
+
+ def getSliceFromMemo(self, size, memo):
+ """Return a result set for the given memo values.
+
+ Note that at least two other implementatians are possible:
+ Instead of OR-combining the expressions returned by
+ whereExpressions(), these expressions could be used for
+ separate SELECTs which are then merged with UNION ALL.
+
+ We could also issue separate Storm queries for each
+ expression and combine the results here.
+
+ Which variant is more efficient is yet unknown; it may
+ differ between different queries.
+ """
+ sort_expressions = self.getOrderBy()
+ where = self.whereExpressions(zip(sort_expressions, memo))
+ where = reduce(Or, where)
+ # From storm.zope.interfaces.IResultSet.__doc__:
+ # - C{find()}, C{group_by()} and C{having()} are really
+ # used to configure result sets, so are mostly intended
+ # for use on the model side.
+ naked_result = removeSecurityProxy(self.resultset).find(where)
+ result = ProxyFactory(naked_result)
+ return result.config(limit=size)
def getSlice(self, size, endpoint_memo='', forwards=True):
"""See `IRangeFactory`."""
@@ -343,14 +408,4 @@
if parsed_memo is None:
return self.resultset.config(limit=size)
else:
- sort_expressions = self.getSortExpressions()
- where = [
- self.whereExpressionFromSortExpression(expression, memo)
- for expression, memo in zip(sort_expressions, parsed_memo)]
- # From storm.zope.interfaces.IResultSet.__doc__:
- # - C{find()}, C{group_by()} and C{having()} are really
- # used to configure result sets, so are mostly intended
- # for use on the model side.
- naked_result = removeSecurityProxy(self.resultset).find(*where)
- result = ProxyFactory(naked_result)
- return result.config(limit=size)
+ return self.getSliceFromMemo(size, parsed_memo)
=== modified file 'lib/canonical/launchpad/webapp/interfaces.py'
--- lib/canonical/launchpad/webapp/interfaces.py 2011-08-01 16:17:14 +0000
+++ lib/canonical/launchpad/webapp/interfaces.py 2011-08-01 16:17:17 +0000
@@ -880,7 +880,7 @@
self.request = request
-class StormRangeFactoryError(AssertionError):
+class StormRangeFactoryError(Exception):
"""Raised when a Storm result set cannot be used for slicing by a
StormRangeFactory.
"""
=== modified file 'lib/canonical/launchpad/webapp/tests/test_batching.py'
--- lib/canonical/launchpad/webapp/tests/test_batching.py 2011-08-01 16:17:14 +0000
+++ lib/canonical/launchpad/webapp/tests/test_batching.py 2011-08-01 16:17:17 +0000
@@ -9,11 +9,9 @@
from lazr.batchnavigator.interfaces import IRangeFactory
from storm.expr import (
+ compile,
Desc,
- Gt,
- Lt,
)
-from storm.variables import IntVariable
from zope.security.proxy import isinstance as zope_isinstance
from canonical.launchpad.components.decoratedresultset import (
@@ -169,7 +167,7 @@
str(exception))
def test_DatetimeJSONEncoder(self):
- # DateTimeJSONEncoder represents Pytjon datetime objects as strings
+ # DateTimeJSONEncoder represents Python datetime objects as strings
# where the value is represented in the ISO time format.
self.assertEqual(
'"2011-07-25T00:00:00"',
@@ -354,29 +352,43 @@
self.assertIs(Person.id, reverse_person_id.expr)
self.assertIs(Person.name, person_name)
- def test_whereExpressionFromSortExpression__asc(self):
- """For ascending sort order, whereExpressionFromSortExpression()
- returns the WHERE clause expression > memo.
- """
- resultset = self.makeStormResultSet()
- range_factory = StormRangeFactory(resultset, self.logError)
- where_clause = range_factory.whereExpressionFromSortExpression(
- expression=Person.id, memo=1)
- self.assertTrue(isinstance(where_clause, Gt))
- self.assertIs(where_clause.expr1, Person.id)
- self.assertTrue(where_clause.expr2, IntVariable)
-
- def test_whereExpressionFromSortExpression_desc(self):
- """For descending sort order, whereExpressionFromSortExpression()
- returns the WHERE clause expression < memo.
- """
- resultset = self.makeStormResultSet()
- range_factory = StormRangeFactory(resultset, self.logError)
- where_clause = range_factory.whereExpressionFromSortExpression(
- expression=Desc(Person.id), memo=1)
- self.assertTrue(isinstance(where_clause, Lt))
- self.assertIs(where_clause.expr1, Person.id)
- self.assertTrue(where_clause.expr2, IntVariable)
+ def test_whereExpressions__asc(self):
+ """For ascending sort order, whereExpressions() returns the
+ WHERE clause expression > memo.
+ """
+ resultset = self.makeStormResultSet()
+ range_factory = StormRangeFactory(resultset, self.logError)
+ [where_clause] = range_factory.whereExpressions([(Person.id, 1)])
+ self.assertEquals('Person.id > ?', compile(where_clause))
+
+ def test_whereExpressions_desc(self):
+ """For descending sort order, whereExpressions() returns the
+ WHERE clause expression < memo.
+ """
+ resultset = self.makeStormResultSet()
+ range_factory = StormRangeFactory(resultset, self.logError)
+ [where_clause] = range_factory.whereExpressions(
+ [(Desc(Person.id), 1)])
+ self.assertEquals('Person.id < ?', compile(where_clause))
+
+ def test_whereExpressions__two_sort_columns(self):
+ """If the sort columns and memo values (c1, m1) and (c2, m2)
+ are specified, whereExpressions() returns two expressions where
+ the first expression is
+
+ c1 == m1 AND c2 > m2
+
+ and the second expression is
+
+ c1 > m1
+ """
+ resultset = self.makeStormResultSet()
+ range_factory = StormRangeFactory(resultset, self.logError)
+ [where_clause_1, where_clause_2] = range_factory.whereExpressions(
+ [(Person.id, 1), (Person.name, 'foo')])
+ self.assertEquals(
+ 'Person.id = ? AND Person.name > ?', compile(where_clause_1))
+ self.assertEquals('Person.id > ?', compile(where_clause_2))
def test_getSlice__forward_without_memo(self):
resultset = self.makeStormResultSet()
@@ -416,6 +428,43 @@
sliced_result = range_factory.getSlice(3, memo, forwards=False)
self.assertEqual(expected, list(sliced_result))
+ def makeResultSetWithPartiallyIdenticalSortData(self):
+ # Create a result set, where each value of
+ # LibraryFileAlias.filename and of LibraryFileAlias.mimetype
+ # appears twice. (Note that Bug.bugattachments returns a
+ # DecoratedResultSet, where the Storm query returns
+ # LibraryFileAlias rows too.)
+ bug = self.factory.makeBug()
+ with person_logged_in(bug.owner):
+ for filename in ('file-a', 'file-b'):
+ for content_type in ('text/a', 'text/b'):
+ for count in range(2):
+ self.factory.makeBugAttachment(
+ bug=bug, owner=bug.owner, filename=filename,
+ content_type=content_type)
+ return bug.attachments
+
+ def test_getSlice__multiple_sort_columns(self):
+ # Slicing works if the result set is ordered by more than
+ # one column, and if the result contains rows with identical
+ resultset = self.makeResultSetWithPartiallyIdenticalSortData()
+ resultset.order_by(
+ LibraryFileAlias.mimetype, LibraryFileAlias.filename,
+ LibraryFileAlias.id)
+ all_results = list(resultset)
+ # We have two results
+ # The third and fourth row have identical filenames and
+ # mime types, but the LibraryFileAlias IDs are different
+ # If we use the values from the third row as memo parameters
+ # for a forward batch, the slice of this btach starts with
+ # the fourth row of the entire result set.
+ memo_lfa = all_results[2].libraryfile
+ memo = simplejson.dumps(
+ [memo_lfa.mimetype, memo_lfa.filename, memo_lfa.id])
+ range_factory = StormRangeFactory(resultset)
+ sliced_result = range_factory.getSlice(3, memo)
+ self.assertEqual(all_results[3:6], list(sliced_result))
+
def test_getSlice__decorated_resultset(self):
resultset = self.makeDecoratedStormResultSet()
resultset.order_by(LibraryFileAlias.id)