← Back to team overview

launchpad-reviewers team mailing list archive

[Merge] lp:~jtv/launchpad/bug-34102 into lp:launchpad

 

Jeroen T. Vermeulen has proposed merging lp:~jtv/launchpad/bug-34102 into lp:launchpad.

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers): code
Related bugs:
  #34102 checkwatches.py doesn't handle http -> https redirects
  https://bugs.launchpad.net/bugs/34102

For more details, see:
https://code.launchpad.net/~jtv/launchpad/bug-34102/+merge/46685

= Bug 34102 =

This is an old bug but, as it turns out, a nasty one.  It happens when searching bugs on a remote BugZilla instance that is not running any version of the Launchpad plugin.

What happened when the bug was reported, the following happened: a POST request to the bug-search form got redirected to https.  As per RFC, urlopen assumes that the POST has been executed and it's being redirected to a result page.  But actually it ends up back on the search page, without any visible effect from the POST.

What the code expects to find there is an XML file (RDF to be precise) detailing the requested bugs.  Instead it got an HTML file that wasn't well-formed XML, and this triggered a parse error.

Since then, bugzilla seems to have fixed the XML parse error on the page.  This doesn't make things any better; it makes things worse.  Because now we successfully parse the HTML page, find no bugs, and fail to notice the problem.

In this branch, Gavin Panella and I fixed this by allowing bug trackers to specify that, on redirect, a POST should be re-POSTed to the URL it's being redirected to.  The Bugzilla code now uses this for the search page.

We also added a small check against receiving <html> pages instead of the expected <bugzilla> ones.  We're not quite sure it'll always be <bugzilla>; some of the tests use <RDF> or <issuezilla>.  So we ended up being liberal, checking only against accidentally receiving HTML pages.  We'll have to see how well this holds up in practice against the bugzilla instances out there.

To test,
{{{
./bin/test -vvc lp.bugs -t externalbugtracker -t bugzilla
}}}

To Q/A, probably the best we can do is have an admin exercise the code in "make harness" to see that it still works for selected Bugzilla instances.


No lint,

Jeroen
-- 
https://code.launchpad.net/~jtv/launchpad/bug-34102/+merge/46685
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~jtv/launchpad/bug-34102 into lp:launchpad.
=== modified file 'lib/lp/bugs/doc/externalbugtracker-bugzilla.txt'
--- lib/lp/bugs/doc/externalbugtracker-bugzilla.txt	2010-10-18 22:24:59 +0000
+++ lib/lp/bugs/doc/externalbugtracker-bugzilla.txt	2011-01-18 21:49:03 +0000
@@ -664,7 +664,7 @@
 bug tracker, an error is logged.
 
     >>> external_bugzilla._postPage = (
-    ...     lambda self, data: '<invalid xml>')
+    ...     lambda self, data, repost_on_redirect: '<invalid xml>')
     >>> bug_watch_updater.updateBugWatches(
     ...     external_bugzilla, [bug_watch1, bug_watch2])
     Traceback (most recent call last):

=== modified file 'lib/lp/bugs/externalbugtracker/base.py'
--- lib/lp/bugs/externalbugtracker/base.py	2010-08-21 07:31:45 +0000
+++ lib/lp/bugs/externalbugtracker/base.py	2011-01-18 21:49:03 +0000
@@ -256,17 +256,31 @@
                                   headers={'User-agent': LP_USER_AGENT})
         return self._fetchPage(request).read()
 
-    def _postPage(self, page, form):
-        """POST to the specified page.
-
-        :form: is a dict of form variables being POSTed.
+    def _post(self, url, data):
+        """Post to a given URL."""
+        request = urllib2.Request(url, headers={'User-agent': LP_USER_AGENT})
+        return self.urlopen(request, data=data)
+
+    def _postPage(self, page, form, repost_on_redirect=False):
+        """POST to the specified page and form.
+
+        :param form: is a dict of form variables being POSTed.
+        :param repost_on_redirect: override RFC-compliant redirect handling.
+            By default, if the POST receives a redirect response, the
+            request to the redirection's target URL will be a GET.  If
+            `repost_on_redirect` is True, this method will do a second POST
+            instead.  Do this only if you are sure that repeated POST to
+            this page is safe, as is usually the case with search forms.
         """
         url = "%s/%s" % (self.baseurl, page)
         post_data = urllib.urlencode(form)
-        request = urllib2.Request(url, headers={'User-agent': LP_USER_AGENT})
-        url = self.urlopen(request, data=post_data)
-        page_contents = url.read()
-        return page_contents
+
+        response = self._post(url, data=post_data)
+
+        if repost_on_redirect and response.url != url:
+            response = self._post(response.url, data=post_data)
+
+        return response.read()
 
 
 class LookupBranch(treelookup.LookupBranch):

=== modified file 'lib/lp/bugs/externalbugtracker/bugzilla.py'
--- lib/lp/bugs/externalbugtracker/bugzilla.py	2010-10-15 11:21:00 +0000
+++ lib/lp/bugs/externalbugtracker/bugzilla.py	2011-01-18 21:49:03 +0000
@@ -224,17 +224,17 @@
         return tuple(int(number) for number in version_numbers)
 
     _importance_lookup = {
-        'blocker':     BugTaskImportance.CRITICAL,
-        'critical':    BugTaskImportance.CRITICAL,
-        'immediate':   BugTaskImportance.CRITICAL,
-        'urgent':      BugTaskImportance.CRITICAL,
-        'major':       BugTaskImportance.HIGH,
-        'high':        BugTaskImportance.HIGH,
-        'normal':      BugTaskImportance.MEDIUM,
-        'medium':      BugTaskImportance.MEDIUM,
-        'minor':       BugTaskImportance.LOW,
-        'low':         BugTaskImportance.LOW,
-        'trivial':     BugTaskImportance.LOW,
+        'blocker': BugTaskImportance.CRITICAL,
+        'critical': BugTaskImportance.CRITICAL,
+        'immediate': BugTaskImportance.CRITICAL,
+        'urgent': BugTaskImportance.CRITICAL,
+        'major': BugTaskImportance.HIGH,
+        'high': BugTaskImportance.HIGH,
+        'normal': BugTaskImportance.MEDIUM,
+        'medium': BugTaskImportance.MEDIUM,
+        'minor': BugTaskImportance.LOW,
+        'low': BugTaskImportance.LOW,
+        'trivial': BugTaskImportance.LOW,
         'enhancement': BugTaskImportance.WISHLIST,
         }
 
@@ -305,6 +305,20 @@
         """See `ExternalBugTracker`."""
         return (bug_id, self.getRemoteBugBatch([bug_id]))
 
+    def _checkBugSearchResult(self, document):
+        """Does `document` appear to be a bug search result page?
+
+        :param document: An `xml.dom.Document` built from a bug search result
+            on the bugzilla instance.
+        :raise UnparseableBugData: If `document` does not appear to be a bug
+            search result.
+        """
+        root = document.documentElement
+        if root.tagName == 'html':
+            raise UnparseableBugData(
+                "Bug search on %s returned a <%s> instead of an RDF page." % (
+                    self.baseurl, root.tagName))
+
     def getRemoteBugBatch(self, bug_ids):
         """See `ExternalBugTracker`."""
         # XXX: GavinPanella 2007-10-25 bug=153532: The modification of
@@ -316,12 +330,13 @@
         # getRemoteStatus.
         if self.is_issuezilla:
             buglist_page = 'xml.cgi'
-            data = {'download_type' : 'browser',
-                    'output_configured' : 'true',
-                    'include_attachments' : 'false',
-                    'include_dtd' : 'true',
-                    'id'      : ','.join(bug_ids),
-                    }
+            data = {
+                'download_type': 'browser',
+                'output_configured': 'true',
+                'include_attachments': 'false',
+                'include_dtd': 'true',
+                'id': ','.join(bug_ids),
+                }
             bug_tag = 'issue'
             id_tag = 'issue_id'
             status_tag = 'issue_status'
@@ -339,15 +354,16 @@
             severity_tag = 'bug_severity'
         else:
             buglist_page = 'buglist.cgi'
-            data = {'form_name'   : 'buglist.cgi',
-                    'bug_id_type' : 'include',
-                    'columnlist'  : 'id,product,bug_status,resolution',
-                    'bug_id'      : ','.join(bug_ids),
-                    }
+            data = {
+                'form_name': 'buglist.cgi',
+                'bug_id_type': 'include',
+                'columnlist': 'id,product,bug_status,resolution',
+                'bug_id': ','.join(bug_ids),
+                }
             if self.version < (2, 17, 1):
-                data.update({'format' : 'rdf'})
+                data.update({'format': 'rdf'})
             else:
-                data.update({'ctype'  : 'rdf'})
+                data.update({'ctype': 'rdf'})
             bug_tag = 'bz:bug'
             id_tag = 'bz:id'
             status_tag = 'bz:bug_status'
@@ -355,12 +371,15 @@
             priority_tag = 'bz:priority'
             severity_tag = 'bz:bug_severity'
 
-        buglist_xml = self._postPage(buglist_page, data)
+        buglist_xml = self._postPage(
+            buglist_page, data, repost_on_redirect=True)
+
         try:
             document = self._parseDOMString(buglist_xml)
         except xml.parsers.expat.ExpatError, e:
             raise UnparseableBugData('Failed to parse XML description for '
                 '%s bugs %s: %s' % (self.baseurl, bug_ids, e))
+        self._checkBugSearchResult(document)
 
         bug_nodes = document.getElementsByTagName(bug_tag)
         for bug_node in bug_nodes:

=== added file 'lib/lp/bugs/externalbugtracker/tests/test_bugzilla.py'
--- lib/lp/bugs/externalbugtracker/tests/test_bugzilla.py	1970-01-01 00:00:00 +0000
+++ lib/lp/bugs/externalbugtracker/tests/test_bugzilla.py	2011-01-18 21:49:03 +0000
@@ -0,0 +1,57 @@
+# Copyright 2010 Canonical Ltd.  This software is licensed under the
+# GNU Affero General Public License version 3 (see the file LICENSE).
+
+"""Tests for the Bugzilla BugTracker."""
+
+__metaclass__ = type
+
+from StringIO import StringIO
+
+from canonical.testing.layers import DatabaseFunctionalLayer
+from lp.bugs.externalbugtracker.base import UnparseableBugData
+from lp.bugs.externalbugtracker.bugzilla import Bugzilla
+from lp.testing import TestCaseWithFactory
+from lp.testing.fakemethod import FakeMethod
+
+
+class TestBugzillaGetRemoteBugBatch(TestCaseWithFactory):
+    """Test POSTs to Bugzilla's bug-search page."""
+    layer = DatabaseFunctionalLayer
+
+    base_url = "http://example.com/";
+
+    def _makeInstrumentedBugzilla(self, page=None, content=None):
+        """Create a `Bugzilla` with a fake urlopen."""
+        if page is None:
+            page = self.factory.getUniqueString()
+        bugzilla = Bugzilla(self.base_url)
+        if content is None:
+            content = "<bugzilla>%s</bugzilla>" % (
+                self.factory.getUniqueString())
+        fake_page = StringIO(content)
+        fake_page.url = self.base_url + page
+        bugzilla.urlopen = FakeMethod(result=fake_page)
+        return bugzilla
+
+    def test_post_to_search_form_does_not_crash(self):
+        page = self.factory.getUniqueString()
+        bugzilla = self._makeInstrumentedBugzilla(page)
+        bugzilla.getRemoteBugBatch([])
+
+    def test_repost_on_redirect_does_not_crash(self):
+        bugzilla = self._makeInstrumentedBugzilla()
+        bugzilla.getRemoteBugBatch([])
+
+    def test_reports_invalid_search_result(self):
+        # Sometimes bug searches may go wrong, yielding an HTML page
+        # instead.  getRemoteBugBatch rejects and reports search results
+        # of the wrong page type.
+        result_text = """
+            <html>
+                <body>
+                    <h1>This is not actually a search result.</h1>
+                </body>
+            </html>
+            """
+        bugzilla = self._makeInstrumentedBugzilla(content=result_text)
+        self.assertRaises(UnparseableBugData, bugzilla.getRemoteBugBatch, [])

=== modified file 'lib/lp/bugs/externalbugtracker/tests/test_externalbugtracker.py'
--- lib/lp/bugs/externalbugtracker/tests/test_externalbugtracker.py	2010-08-20 20:31:18 +0000
+++ lib/lp/bugs/externalbugtracker/tests/test_externalbugtracker.py	2011-01-18 21:49:03 +0000
@@ -5,7 +5,7 @@
 
 __metaclass__ = type
 
-import unittest
+from StringIO import StringIO
 
 from zope.interface import implements
 
@@ -17,14 +17,17 @@
     ISupportsCommentPushing,
     )
 from lp.testing import TestCase
+from lp.testing.fakemethod import FakeMethod
 
 
 class BackLinkingExternalBugTracker(ExternalBugTracker):
     implements(ISupportsBackLinking)
 
+
 class CommentImportingExternalBugTracker(ExternalBugTracker):
     implements(ISupportsCommentImport)
 
+
 class CommentPushingExternalBugTracker(ExternalBugTracker):
     implements(ISupportsCommentPushing)
 
@@ -88,6 +91,56 @@
         tracker = DebBugs(self.base_url)
         self.assertFalse(tracker.sync_comments)
 
-
-def test_suite():
-    return unittest.TestLoader().loadTestsFromName(__name__)
+    def _makeFakePostForm(self, base_url, page=None):
+        """Create a fake `urllib2.urlopen` result."""
+        content = "<bugzilla>%s</bugzilla>" % self.factory.getUniqueString()
+        fake_form = StringIO(content)
+        if page is None:
+            page = self.factory.getUniqueString()
+        fake_form.url = base_url + page
+        return fake_form
+
+    def _fakeExternalBugTracker(self, base_url, fake_form):
+        """Create an `ExternalBugTracker` with a fake `_post` method."""
+        bugtracker = ExternalBugTracker(base_url)
+        bugtracker._post = FakeMethod(result=fake_form)
+        return bugtracker
+
+    def test_postPage_returns_response_page(self):
+        # _postPage posts, then returns the page text it gets back from
+        # the server.
+        base_url = "http://example.com/";
+        form = self.factory.getUniqueString()
+        fake_form = self._makeFakePostForm(base_url, page=form)
+        bugtracker = self._fakeExternalBugTracker(base_url, fake_form)
+        self.assertEqual(fake_form.getvalue(), bugtracker._postPage(form, {}))
+
+    def test_postPage_does_not_repost_on_redirect(self):
+        # By default, if the POST redirects, _postPage leaves urllib2 to
+        # handle it in the normal, RFC-compliant way.
+        base_url = "http://example.com/";
+        form = self.factory.getUniqueString()
+        fake_form = self._makeFakePostForm(base_url)
+        bugtracker = self._fakeExternalBugTracker(base_url, fake_form)
+
+        bugtracker._postPage(form, {})
+
+        self.assertEqual(1, bugtracker._post.call_count)
+        args, kwargs = bugtracker._post.calls[0]
+        self.assertEqual((base_url + form, ), args)
+
+    def test_postPage_can_repost_on_redirect(self):
+        # Some pages (that means you, BugZilla bug-search page!) can
+        # redirect on POST, but without honouring the POST.  Standard
+        # urllib2 behaviour is to redirect to a GET, but if the caller
+        # says it's safe, _postPage can re-do the POST at the new URL.
+        base_url = "http://example.com/";
+        form = self.factory.getUniqueString()
+        fake_form = self._makeFakePostForm(base_url)
+        bugtracker = self._fakeExternalBugTracker(base_url, fake_form)
+
+        bugtracker._postPage(form, form={}, repost_on_redirect=True)
+
+        self.assertEqual(2, bugtracker._post.call_count)
+        last_args, last_kwargs = bugtracker._post.calls[-1]
+        self.assertEqual((fake_form.url, ), last_args)

=== modified file 'lib/lp/bugs/tests/externalbugtracker.py'
--- lib/lp/bugs/tests/externalbugtracker.py	2010-11-08 14:16:17 +0000
+++ lib/lp/bugs/tests/externalbugtracker.py	2011-01-18 21:49:03 +0000
@@ -300,7 +300,7 @@
         else:
             raise AssertionError('Unknown page: %s' % page)
 
-    def _postPage(self, page, form):
+    def _postPage(self, page, form, repost_on_redirect=False):
         """POST to the specified page.
 
         :form: is a dict of form variables being POSTed.
@@ -1119,7 +1119,7 @@
         else:
             return ''
 
-    def _postPage(self, page, form):
+    def _postPage(self, page, form, repost_on_redirect=False):
         if self.trace_calls:
             print "CALLED _postPage(%r, ...)" % (page)
         return ''