launchpad-reviewers team mailing list archive

Thread
Date

[Merge] lp:~adeuring/launchpad/bug-739052-2 into lp:launchpad

To: mp+70044@xxxxxxxxxxxxxxxxxx
From: Abel Deuring <abel.deuring@xxxxxxxxxxxxx>
Date: Mon, 01 Aug 2011 16:17:33 -0000
Reply-to: mp+70044@xxxxxxxxxxxxxxxxxx
Sender: bounces@xxxxxxxxxxxxx

Abel Deuring has proposed merging lp:~adeuring/launchpad/bug-739052-2 into lp:launchpad with lp:~adeuring/launchpad/bug-739052 as a prerequisite.

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers)

For more details, see:
https://code.launchpad.net/~adeuring/launchpad/bug-739052-2/+merge/70044

(see https://code.launchpad.net/~adeuring/launchpad/bug-739052/+merge/69625 for a description of StormRangeFactory)

This branch fixes a serious issue discovered by lifeless when he reviewed the prerequisite branch of this one: lp:~adeuring/launchpad/bug-739052 :


The WHERE query used in StormRangeFactory.getSlice() was plain wrong when the Storm result set is ordered by two or more columns:

- If we have a query like SELECT ... FROM ... ORDER BY table.col1, table.col2
- If the result set contains rows with values for col1, col2 like

    1    1
    1    2
    1    3
    2    1
    2    2
    2    3

- If the memo value is (1, 2), the old variant would use a slicing related WHERE clause
  "col1 > 1 AND col2 > 2", which would return only the rows (1, 3) and (2, 3)

The fix is obvious: If a result set is ordered by N columns, we must
- first look for rows where the first N-1 sort columns have the value given by memo, and where
  column N has values greater than the memo value;
- then look for rows where the first N-2 sort columns have the value given by memo, and where
  column N-1 has values greater than the memo value;
- [repeat for N-2..1]

I chose quite arbitrarily to simply OR combine these WHERE expressions; as noted in a comment, it might sometimes be better to use sub-SELECTs for each expression and to combine these sub-selects by UNION ALL.

Up to now, the class StormRangeFactory has not been tested with production data -- I am really not sure if we can predict when which query variant will be faster, and I doubt that we can be sure that  one variant will be faster than the other for all queries. So I think we might need to sub-class StormRangeFactory() if the UNION ALL variant turns out to be more useful in certain cases. It should be enough to override the method getSlicedFromMemo().

I also renamed getSortExpressions() with getorderBy(), as suggested by lifeless.

test:


./bin/test canonical -vvt canonical.launchpad.webapp.tests.test_batching.TestStormRangeFactory

no lint

-- 
https://code.launchpad.net/~adeuring/launchpad/bug-739052-2/+merge/70044
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~adeuring/launchpad/bug-739052-2 into lp:launchpad.

=== modified file 'lib/canonical/launchpad/webapp/batching.py'
--- lib/canonical/launchpad/webapp/batching.py	2011-08-01 16:17:14 +0000
+++ lib/canonical/launchpad/webapp/batching.py	2011-08-01 16:17:17 +0000
@@ -1,4 +1,4 @@
-# Copyright 2009 Canonical Ltd.  This software is licensed under the
+# Copyright 2011 Canonical Ltd.  This software is licensed under the
 # GNU Affero General Public License version 3 (see the file LICENSE).
 
 __metaclass__ = type
@@ -9,7 +9,11 @@
 from lazr.batchnavigator.interfaces import IRangeFactory
 import simplejson
 from storm import Undef
-from storm.expr import Desc
+from storm.expr import (
+    And,
+    Desc,
+    Or,
+    )
 from storm.properties import PropertyColumn
 from storm.zope.interfaces import IResultSet
 from zope.component import adapts
@@ -175,6 +179,10 @@
           store.find(Bug, Bug.id < 10)
 
       works.
+
+    Note: This factory assumes that the result set is fully sorted,
+    i.e. that the set of the column values used for sorting is
+    distinct for each result row.
     """
     implements(IRangeFactory)
 
@@ -194,7 +202,7 @@
             self.plain_resultset = resultset
         self.error_cb = error_cb
 
-    def getSortExpressions(self):
+    def getOrderBy(self):
         """Return the order_by expressions of the result set."""
         return removeSecurityProxy(self.plain_resultset)._order_by
 
@@ -204,7 +212,7 @@
         sort_values = []
         if not zope_isinstance(row, tuple):
             row = (row, )
-        sort_expressions = self.getSortExpressions()
+        sort_expressions = self.getOrderBy()
         if sort_expressions is Undef:
             raise StormRangeFactoryError(
                 'StormRangeFactory requires a sorted result set.')
@@ -213,7 +221,7 @@
                 expression = expression.expr
             if not zope_isinstance(expression, PropertyColumn):
                 raise StormRangeFactoryError(
-                    'StormRangeFactory supports only sorting by '
+                    'StormRangeFactory only supports sorting by '
                     'PropertyColumn, not by %r.' % expression)
             class_instance_found = False
             for row_part in row:
@@ -267,7 +275,7 @@
                 'memo must be the JSON representation of a list.')
             return None
 
-        sort_expressions = self.getSortExpressions()
+        sort_expressions = self.getOrderBy()
         if len(sort_expressions) != len(parsed_memo):
             self.reportError(
                 'Invalid number of elements in memo string. '
@@ -323,17 +331,74 @@
 
         return [
             invert_sort_expression(expression)
-            for expression in self.getSortExpressions()]
-
-    def whereExpressionFromSortExpression(self, expression, memo):
-        """Create a Storm expression to be used in the WHERE clause of the
-        slice query.
-        """
-        if isinstance(expression, Desc):
-            expression = expression.expr
-            return expression < memo
-        else:
-            return expression > memo
+            for expression in self.getOrderBy()]
+
+    def andClausesForLeadingColumns(self, limits):
+        def plain_expression(expression):
+            # Strip a possible Desc from an expression.
+            if isinstance(expression, Desc):
+                return expression.expr
+            else:
+                return expression
+        return [plain_expression(column) == memo for column, memo in limits]
+
+    def whereExpressions(self, limits):
+        """Generate WHERE expressions for the given sort columns and
+        memos values.
+
+        :return: A list of where expressions which should be OR-ed
+        :param limits: A sequence of (column, mem_value) tuples.
+
+        Note that the result set may be ordered by more than one column.
+        Given the sort columns c[1], c[2] ... c[N] and assuming ascending
+        sorting, we must look for all rows where
+          * c[1] == memo[1], c[2] == memo[2] ... c[N-1] == memo[N-1] and
+            c[N] > memo[N]
+          * c[1] == memo[1], c[2] == memo[2] ... c[N-2] == memo[N-2] and
+            c[N-1] > memo[N-1]
+          ...
+          * c[1] == memo[1], c[2] > memo[2]
+          * c[1] > memo[1]
+        """
+        start = limits[:-1]
+        last_expression, last_memo = limits[-1]
+        if isinstance(last_expression, Desc):
+            last_expression = last_expression.expr
+            last_limit = last_expression < last_memo
+        else:
+            last_limit = last_expression > last_memo
+        if len(start) > 0:
+            clauses = self.andClausesForLeadingColumns(start)
+            clauses.append(last_limit)
+            clause_for_last_column = reduce(And, clauses)
+            return [clause_for_last_column] + self.whereExpressions(start)
+        else:
+            return [last_limit]
+
+    def getSliceFromMemo(self, size, memo):
+        """Return a result set for the given memo values.
+
+        Note that at least two other implementatians are possible:
+        Instead of OR-combining the expressions returned by
+        whereExpressions(), these expressions could be used for
+        separate SELECTs which are then merged with UNION ALL.
+
+        We could also issue separate Storm queries for each
+        expression and combine the results here.
+
+        Which variant is more efficient is yet unknown; it may
+        differ between different queries.
+        """
+        sort_expressions = self.getOrderBy()
+        where = self.whereExpressions(zip(sort_expressions, memo))
+        where = reduce(Or, where)
+        # From storm.zope.interfaces.IResultSet.__doc__:
+        #     - C{find()}, C{group_by()} and C{having()} are really
+        #       used to configure result sets, so are mostly intended
+        #       for use on the model side.
+        naked_result = removeSecurityProxy(self.resultset).find(where)
+        result = ProxyFactory(naked_result)
+        return result.config(limit=size)
 
     def getSlice(self, size, endpoint_memo='', forwards=True):
         """See `IRangeFactory`."""
@@ -343,14 +408,4 @@
         if parsed_memo is None:
             return self.resultset.config(limit=size)
         else:
-            sort_expressions = self.getSortExpressions()
-            where = [
-                self.whereExpressionFromSortExpression(expression, memo)
-                for expression, memo in zip(sort_expressions, parsed_memo)]
-            # From storm.zope.interfaces.IResultSet.__doc__:
-            #     - C{find()}, C{group_by()} and C{having()} are really
-            #       used to configure result sets, so are mostly intended
-            #       for use on the model side.
-            naked_result = removeSecurityProxy(self.resultset).find(*where)
-            result = ProxyFactory(naked_result)
-            return result.config(limit=size)
+            return self.getSliceFromMemo(size, parsed_memo)

=== modified file 'lib/canonical/launchpad/webapp/interfaces.py'
--- lib/canonical/launchpad/webapp/interfaces.py	2011-08-01 16:17:14 +0000
+++ lib/canonical/launchpad/webapp/interfaces.py	2011-08-01 16:17:17 +0000
@@ -880,7 +880,7 @@
             self.request = request
 
 
-class StormRangeFactoryError(AssertionError):
+class StormRangeFactoryError(Exception):
     """Raised when a Storm result set cannot be used for slicing by a
     StormRangeFactory.
     """

=== modified file 'lib/canonical/launchpad/webapp/tests/test_batching.py'
--- lib/canonical/launchpad/webapp/tests/test_batching.py	2011-08-01 16:17:14 +0000
+++ lib/canonical/launchpad/webapp/tests/test_batching.py	2011-08-01 16:17:17 +0000
@@ -9,11 +9,9 @@
 
 from lazr.batchnavigator.interfaces import IRangeFactory
 from storm.expr import (
+    compile,
     Desc,
-    Gt,
-    Lt,
     )
-from storm.variables import IntVariable
 from zope.security.proxy import isinstance as zope_isinstance
 
 from canonical.launchpad.components.decoratedresultset import (
@@ -169,7 +167,7 @@
             str(exception))
 
     def test_DatetimeJSONEncoder(self):
-        # DateTimeJSONEncoder represents Pytjon datetime objects as strings
+        # DateTimeJSONEncoder represents Python datetime objects as strings
         # where the value is represented in the ISO time format.
         self.assertEqual(
             '"2011-07-25T00:00:00"',
@@ -354,29 +352,43 @@
         self.assertIs(Person.id, reverse_person_id.expr)
         self.assertIs(Person.name, person_name)
 
-    def test_whereExpressionFromSortExpression__asc(self):
-        """For ascending sort order, whereExpressionFromSortExpression()
-        returns the WHERE clause expression > memo.
-        """
-        resultset = self.makeStormResultSet()
-        range_factory = StormRangeFactory(resultset, self.logError)
-        where_clause = range_factory.whereExpressionFromSortExpression(
-            expression=Person.id, memo=1)
-        self.assertTrue(isinstance(where_clause, Gt))
-        self.assertIs(where_clause.expr1, Person.id)
-        self.assertTrue(where_clause.expr2, IntVariable)
-
-    def test_whereExpressionFromSortExpression_desc(self):
-        """For descending sort order, whereExpressionFromSortExpression()
-        returns the WHERE clause expression < memo.
-        """
-        resultset = self.makeStormResultSet()
-        range_factory = StormRangeFactory(resultset, self.logError)
-        where_clause = range_factory.whereExpressionFromSortExpression(
-            expression=Desc(Person.id), memo=1)
-        self.assertTrue(isinstance(where_clause, Lt))
-        self.assertIs(where_clause.expr1, Person.id)
-        self.assertTrue(where_clause.expr2, IntVariable)
+    def test_whereExpressions__asc(self):
+        """For ascending sort order, whereExpressions() returns the
+        WHERE clause expression > memo.
+        """
+        resultset = self.makeStormResultSet()
+        range_factory = StormRangeFactory(resultset, self.logError)
+        [where_clause] = range_factory.whereExpressions([(Person.id, 1)])
+        self.assertEquals('Person.id > ?', compile(where_clause))
+
+    def test_whereExpressions_desc(self):
+        """For descending sort order, whereExpressions() returns the
+        WHERE clause expression < memo.
+        """
+        resultset = self.makeStormResultSet()
+        range_factory = StormRangeFactory(resultset, self.logError)
+        [where_clause] = range_factory.whereExpressions(
+            [(Desc(Person.id), 1)])
+        self.assertEquals('Person.id < ?', compile(where_clause))
+
+    def test_whereExpressions__two_sort_columns(self):
+        """If the sort columns and memo values (c1, m1) and (c2, m2)
+        are specified, whereExpressions() returns two expressions where
+        the first expression is
+
+            c1 == m1 AND c2 > m2
+
+        and the second expression is
+
+            c1 > m1
+        """
+        resultset = self.makeStormResultSet()
+        range_factory = StormRangeFactory(resultset, self.logError)
+        [where_clause_1, where_clause_2] = range_factory.whereExpressions(
+            [(Person.id, 1), (Person.name, 'foo')])
+        self.assertEquals(
+            'Person.id = ? AND Person.name > ?', compile(where_clause_1))
+        self.assertEquals('Person.id > ?', compile(where_clause_2))
 
     def test_getSlice__forward_without_memo(self):
         resultset = self.makeStormResultSet()
@@ -416,6 +428,43 @@
         sliced_result = range_factory.getSlice(3, memo, forwards=False)
         self.assertEqual(expected, list(sliced_result))
 
+    def makeResultSetWithPartiallyIdenticalSortData(self):
+        # Create a result set, where each value of
+        # LibraryFileAlias.filename and of LibraryFileAlias.mimetype
+        # appears twice. (Note that Bug.bugattachments returns a
+        # DecoratedResultSet, where the Storm query returns
+        # LibraryFileAlias rows too.)
+        bug = self.factory.makeBug()
+        with person_logged_in(bug.owner):
+            for filename in ('file-a', 'file-b'):
+                for content_type in ('text/a', 'text/b'):
+                    for count in range(2):
+                        self.factory.makeBugAttachment(
+                            bug=bug, owner=bug.owner, filename=filename,
+                            content_type=content_type)
+        return bug.attachments
+
+    def test_getSlice__multiple_sort_columns(self):
+        # Slicing works if the result set is ordered by more than
+        # one column, and if the result contains rows with identical
+        resultset = self.makeResultSetWithPartiallyIdenticalSortData()
+        resultset.order_by(
+            LibraryFileAlias.mimetype, LibraryFileAlias.filename,
+            LibraryFileAlias.id)
+        all_results = list(resultset)
+        # We have two results
+        # The third and fourth row have identical filenames and
+        # mime types, but the LibraryFileAlias IDs are different
+        # If we use the values from the third row as memo parameters
+        # for a forward batch, the slice of this btach starts with
+        # the fourth row of the entire result set.
+        memo_lfa = all_results[2].libraryfile
+        memo = simplejson.dumps(
+            [memo_lfa.mimetype, memo_lfa.filename, memo_lfa.id])
+        range_factory = StormRangeFactory(resultset)
+        sliced_result = range_factory.getSlice(3, memo)
+        self.assertEqual(all_results[3:6], list(sliced_result))
+
     def test_getSlice__decorated_resultset(self):
         resultset = self.makeDecoratedStormResultSet()
         resultset.order_by(LibraryFileAlias.id)