← Back to team overview

launchpad-reviewers team mailing list archive

[Merge] lp:~lifeless/launchpad/malone into lp:launchpad/devel

 

Robert Collins has proposed merging lp:~lifeless/launchpad/malone into lp:launchpad/devel.

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers)
Related bugs:
  #607776 +filebug-show-similar FTI query timeouts
  https://bugs.launchpad.net/bugs/607776


Further improve the performance of bug duplicate detection by switching from | (any term matches) to & (all terms must match), with an expansion to handle one missing term.

This reduces over 90% of the overhead in tests so far, taking a 8second operation(on staging) down to 800msec.

I'm putting this up for review now to get feedback on the changes; I'm going to spin it up on staging and see if it delivers what it looks like it promises.
-- 
https://code.launchpad.net/~lifeless/launchpad/malone/+merge/30904
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~lifeless/launchpad/malone into lp:launchpad/devel.
=== modified file 'lib/canonical/database/nl_search.py'
--- lib/canonical/database/nl_search.py	2010-07-21 19:55:38 +0000
+++ lib/canonical/database/nl_search.py	2010-07-25 22:07:47 +0000
@@ -73,17 +73,30 @@
 def _nl_phrase_search(terms, table, constraints, extra_constraints_tables):
     """Perform a very simple pruning of the phrase, letting fti do ranking.
 
+    This function groups the terms with & clause, and creates an additional
+    & grouping for each subset of terms created by discarding one term.
+    
+    When there are less than 3 terms, additional groupings are not created.
+
     See nl_phrase_search for the contract of this function.
     """
-    bad_words = set(['firefox', 'ubuntu', 'launchpad'])
     terms = set(terms)
-    filtered_terms = []
-    for term in terms:
-        if term.lower() in bad_words:
-            continue
-        filtered_terms.append(term)
-    # sorted for testing convenience.
-    return '|'.join(sorted(filtered_terms))
+    # As a special case, we don't expand a 2-term search : its very slow and
+    # too noisy at the moment.
+    # sorted for doctesting convenience - should have no impact on tsearch2.
+    if len(terms) < 3:
+        return '&'.join(sorted(terms))
+    # Expand
+    and_groups = [None] * (len(terms) + 1)
+    for pos in range(len(terms) + 1):
+        and_groups[pos] = set(terms)
+    # sorted for doctesting convenience - should have no impact on tsearch2.
+    for pos, term in enumerate(sorted(terms)):
+        and_groups[pos + 1].discard(term)
+    # sorted for doctesting convenience - should have no impact on tsearch2.
+    and_clauses = ['(' + '&'.join(sorted(group)) + ')'
+        for group in and_groups]
+    return '|'.join(and_clauses)
 
 
 def _slow_nl_phrase_search(terms, table, constraints,

=== modified file 'lib/canonical/launchpad/doc/textsearching.txt'
--- lib/canonical/launchpad/doc/textsearching.txt	2010-07-21 19:55:38 +0000
+++ lib/canonical/launchpad/doc/textsearching.txt	2010-07-25 22:07:47 +0000
@@ -622,11 +622,17 @@
     >>> nl_phrase_search('how do I do this?', Question)
     ''
 
-The fast code path only removes firefox, ubuntu and launchpad today:
+The fast code path does not remove any terms. Rather it uses an & query over
+all the terms combined with an & query for each ordinal-1 subset of the terms:
 
     >>> nl_phrase_search('system is slow when running firefox on ubuntu',
     ...     Question)
-    u'run|slow|system'
+    u'(firefox&run&slow&system&ubuntu)|(run&slow&system&ubuntu)|(firefox&slow&system&ubuntu)|(firefox&run&system&ubuntu)|(firefox&run&slow&ubuntu)|(firefox&run&slow&system)'
+
+Short queries are not expanded like this:
+
+    >>> nl_phrase_search('system is slow', Question)
+    u'slow&system'
 
 
 ==== Using other constraints ====
@@ -655,10 +661,10 @@
     ...     ['Product'], fast_enabled=False)
     u'slow|flickr'
 
-When the query only has stop words or common words in it, the returned
-query will be the empty string:
+When the query only has stop words in it, the returned query will be the empty
+string:
 
-    >>> nl_phrase_search('ubuntu will not do it', Question)
+    >>> nl_phrase_search('will not do it', Question)
     ''
 
 When there are no candidate rows, only stemming and stop words removal
@@ -668,7 +674,7 @@
     0
     >>> nl_phrase_search('firefox is very slow on flickr', Question,
     ...                  'product = -1')
-    u'flickr|slow'
+    u'(firefox&flickr&slow)|(flickr&slow)|(firefox&slow)|(firefox&flickr)'
 
 
 ==== No keywords filtering with few rows ====
@@ -705,4 +711,4 @@
     ...     'firefox is slow', Question,
     ...     'distribution = %s AND sourcepackagename = %s' % sqlvalues(
     ...     ubuntu, firefox_package.sourcepackagename))
-    u'slow'
+    u'firefox&slow'

=== modified file 'lib/lp/answers/doc/faqtarget.txt'
--- lib/lp/answers/doc/faqtarget.txt	2010-07-21 19:55:38 +0000
+++ lib/lp/answers/doc/faqtarget.txt	2010-07-25 22:07:47 +0000
@@ -159,7 +159,6 @@
     >>> for faq in target.findSimilarFAQs('How do I use the Answer Tracker'):
     ...     print faq.title
     How to answer a question
-    How to use bug mail
     How to become a Launchpad king
 
 The results are ordered by relevancy. The first document is considered

=== modified file 'lib/lp/answers/doc/questiontarget.txt'
--- lib/lp/answers/doc/questiontarget.txt	2010-07-21 19:55:38 +0000
+++ lib/lp/answers/doc/questiontarget.txt	2010-07-25 22:07:47 +0000
@@ -359,10 +359,6 @@
     ...     print t.title
     New question
     Another question
-    Question title3
-    Question title2
-    Question title1
-    Question title0
 
 
 

=== modified file 'lib/lp/answers/stories/question-add-in-other-languages.txt'
--- lib/lp/answers/stories/question-add-in-other-languages.txt	2010-07-22 05:48:07 +0000
+++ lib/lp/answers/stories/question-add-in-other-languages.txt	2010-07-25 22:07:47 +0000
@@ -41,10 +41,18 @@
 understood by any member of the support community.
 
     >>> similar_questions = find_tag_by_id(browser.contents, 'similar-questions')
-    >>> for row in similar_questions.findAll('tr', 'noted'):
-    ...     row.first('a').renderContents()
-    'Installation of Java Runtime Environment for Mozilla'
-    'Problema al recompilar kernel con soporte smp (doble-n\xc3\xbacleo)'
+    
+    XXX: Making search fast has a significant impact on this tests' use case,
+    because there are 9 terms - the query has to ignore 7 of the terms to
+    permit a hit under the new & based logic (which is needed for scaling).
+    When we push better relevance and cheap ordering into the core we will be
+    able to test this case again. Note that in production this exact use case
+    already finds nothing.
+
+    #>>> for row in similar_questions.findAll('tr', 'noted'):
+    #...     row.first('a').renderContents()
+    #'Installation of Java Runtime Environment for Mozilla'
+    #'Problema al recompilar kernel con soporte smp (doble-n\xc3\xbacleo)'
 
     >>> for tag in find_tags_by_class(browser.contents, 'warning message'):
     ...     print tag.renderContents()

=== modified file 'lib/lp/answers/stories/question-add.txt'
--- lib/lp/answers/stories/question-add.txt	2010-07-22 05:48:07 +0000
+++ lib/lp/answers/stories/question-add.txt	2010-07-25 22:07:47 +0000
@@ -66,8 +66,17 @@
 The user enters his problem summary and a list of similar FAQs and
 questions are displayed.
 
+XXX: Original search, disabled due to performance issues RBC 20100725. This
+will be reinstated when cheap relevance filtering is available / when search
+is overhauled.
+
     >>> user_browser.getControl('Summary').value = (
     ...     'Visiting a web page requiring java crashes firefox')
+
+For now, use a closer search:
+
+    >>> user_browser.getControl('Summary').value = (
+    ...     'java web pages')
     >>> user_browser.getControl('Continue').click()
     >>> contents = find_main_content(user_browser.contents)
     >>> similar_faqs = contents.find(id='similar-faqs')
@@ -122,7 +131,7 @@
 summary he provided.
 
     >>> user_browser.getControl('Summary').value
-    'Visiting a web page requiring java crashes firefox'
+    'java web pages'
 
 If the user doesn't provide details, he'll get an error message:
 

=== modified file 'lib/lp/bugs/doc/bugtask.txt'
--- lib/lp/bugs/doc/bugtask.txt	2010-07-21 19:55:38 +0000
+++ lib/lp/bugs/doc/bugtask.txt	2010-07-25 22:07:47 +0000
@@ -1365,7 +1365,6 @@
     >>> for similar_bug in sorted(similar_bugs, key=attrgetter('id')):
     ...     print "%s: %s" % (similar_bug.id, similar_bug.title)
     1: Firefox does not support SVG
-    4: Reflow problems with complex page layouts
     5: Firefox install instructions should be complete
 
 This also works for distributions...
@@ -1375,12 +1374,6 @@
     >>> for similar_bug in sorted(similar_bugs, key=attrgetter('id')):
     ...     print "%s: %s" % (similar_bug.id, similar_bug.title)
     1: Firefox does not support SVG
-    2: Blackhole Trash folder
-    9: Thunderbird crashes
-    10: another test bug
-    16: a test bug
-    17: a test bug
-    18: New Bug
 
 
 ... and for SourcePackages.
@@ -1394,7 +1387,6 @@
     >>> for similar_bug in sorted(similar_bugs, key=attrgetter('id')):
     ...     print "%s: %s" % (similar_bug.id, similar_bug.title)
     1: Firefox does not support SVG
-    18: New Bug
 
 Private bugs won't show up in the list of similar bugs unless the user
 is a direct subscriber. We'll demonstrate this by creating a new bug
@@ -1406,7 +1398,6 @@
     >>> for similar_bug in sorted(similar_bugs, key=attrgetter('id')):
     ...     print "%s: %s" % (similar_bug.id, similar_bug.title)
     1: Firefox does not support SVG
-    4: Reflow problems with complex page layouts
     5: Firefox install instructions should be complete
     ...Yet another Firefox bug
 
@@ -1420,7 +1411,6 @@
     >>> for similar_bug in sorted(similar_bugs, key=attrgetter('id')):
     ...     print "%s: %s" % (similar_bug.id, similar_bug.title)
     1: Firefox does not support SVG
-    4: Reflow problems with complex page layouts
     5: Firefox install instructions should be complete
 
 

=== modified file 'lib/lp/bugs/stories/guided-filebug/xx-sorting-by-relevance.txt'
--- lib/lp/bugs/stories/guided-filebug/xx-sorting-by-relevance.txt	2010-07-22 05:48:07 +0000
+++ lib/lp/bugs/stories/guided-filebug/xx-sorting-by-relevance.txt	2010-07-25 22:07:47 +0000
@@ -5,6 +5,11 @@
 reason, the search result is sorted by how well the given summary
 matches the bug's summary and description.
 
+These bugs used to return more hits, but the more restrictive search introduced
+to bandaid search scaling issues inhibits that in the sample data; when we get
+a relevance-integrated index we'll be able to do relevance ordered, limited
+searches efficiently and these tests can be expanded at that point.
+
     >>> user_browser.open("http://launchpad.dev/products/firefox/+filebug";)
 
     >>> user_browser.getControl("Summary").value = (
@@ -16,10 +21,6 @@
     >>> print_bugs_list(user_browser.contents, "similar-bugs")
     #1 Firefox does not support SVG
     New (1 comment) last updated 2006-05-19...
-    Firefox needs to support embedded SVG...
-    #4 Reflow problems with complex page layouts
-    New (0 comments) last updated 2006-07-14...
-    Malone pages that use more complex layouts...
 
 If we instead enter a summary that matches bug #4 better, the result will
 be reversed.
@@ -27,16 +28,10 @@
     >>> user_browser.open("http://launchpad.dev/products/firefox/+filebug";)
 
     >>> user_browser.getControl("Summary").value = (
-    ...     "Reflow problems when there are SVG images on the page")
+    ...     "Reflow problems with SVG")
     >>> user_browser.getControl("Continue").click()
 
     >>> print_bugs_list(user_browser.contents, "similar-bugs")
     #4 Reflow problems with complex page layouts
     New (0 comments) last updated 2006-07-14...
     Malone pages that use more complex layouts...
-    #1 Firefox does not support SVG
-    New (1 comment) last updated 2006-05-19...
-    Firefox needs to support embedded SVG...
-    #5 Firefox install instructions should be complete
-    New (0 comments) last updated 2006-07-14...
-    All ways of downloading firefox should provide complete...