← Back to team overview

launchpad-reviewers team mailing list archive

[Merge] lp:~bryce/launchpad/lp-796645-cron-dies-on-404 into lp:launchpad

 

Bryce Harrington has proposed merging lp:~bryce/launchpad/lp-796645-cron-dies-on-404 into lp:launchpad.

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers)
Related bugs:
  Bug #796645 in Launchpad itself: "update-bugzilla-remote-components.py dies on 404"
  https://bugs.launchpad.net/launchpad/+bug/796645

For more details, see:
https://code.launchpad.net/~bryce/launchpad/lp-796645-cron-dies-on-404/+merge/66541

The update-bugzilla-remote-components.py cron job can fail when getRemoteProductsAndComponents() hits various HTTPErrors on any bug tracker.  (LP: #796645)

This merge adds a general exception handler to catch and ignore any errors thrown by the urllib2 call.  This will allow data to be gathered from those bugzillas which are not broken.  It does not attempt any corrections, retries, or workarounds for broken bug trackers, it just leaves any previously collected data in place and carries on.

Most of this branch is adding test code so we can synthetically test for HTTPErrors.

This merge modifies getRemoteProductsAndComponents() so that instead of creating a new scraper object per bugtracker, to instead allow the test code to pass in its own scraper, which is used for all requests instead of generating a scraper object for each URL.
  
The test code can then supply scrapers subclassed from BugzillaRemoteComponentScraper with hardcoded behaviors:
  
  StaticTextBugzillaRemoteComponentScraper replaces the static text
  parameter.  Instead of hardcoding a check for this option, we allow the
  test code to provide the static text via the subclasses' getPage().
  
  FaultyBugzillaRemoteComponentScraper provides a way to simulate hitting
  various error codes by raising whatever errors we pass to it.
  
A assertGetRemoteProductsAndComponentsDoesNotAssert() helper method is added to allow us to run getRemoteProductsAndComponents() and check that no assertions were thrown.

-- 
https://code.launchpad.net/~bryce/launchpad/lp-796645-cron-dies-on-404/+merge/66541
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~bryce/launchpad/lp-796645-cron-dies-on-404 into lp:launchpad.
=== modified file 'lib/lp/bugs/scripts/bzremotecomponentfinder.py'
--- lib/lp/bugs/scripts/bzremotecomponentfinder.py	2011-06-02 22:27:11 +0000
+++ lib/lp/bugs/scripts/bzremotecomponentfinder.py	2011-07-01 02:59:24 +0000
@@ -102,19 +102,20 @@
         u"mozilla.org",
         ]
 
-    def __init__(self, logger=None, static_bugzilla_text=None):
+    def __init__(self, logger=None, static_bugzilla_scraper=None):
         """Instantiates object, without performing any parsing.
 
         :param logger: A logger object
-        :param static_bugzilla_text: Instead of retrieving the remote
-         web page for a bug tracker, act as if this static text was
-         returned.  This is intended for testing purposes to avoid
-         needing to make remote web connections.
+        :param static_bugzilla_scraper: Substitute this custom bugzilla
+         scraper object instead of constructing a new
+         BugzillaRemoteComponentScraper for each bugtracker's URL.  This
+         is intended for testing purposes to avoid needing to make remote
+         web connections.
         """
         self.logger = logger
         if logger is None:
             self.logger = default_log
-        self.static_bugzilla_text = static_bugzilla_text
+        self.static_bugzilla_scraper = static_bugzilla_scraper
 
     def getRemoteProductsAndComponents(self, bugtracker_name=None):
         lp_bugtrackers = getUtility(IBugTrackerSet)
@@ -134,21 +135,24 @@
 
             self.logger.info("%s: %s" % (
                 lp_bugtracker.name, lp_bugtracker.baseurl))
-            bz_bugtracker = BugzillaRemoteComponentScraper(
-                base_url=lp_bugtracker.baseurl)
-
-            if self.static_bugzilla_text is not None:
-                self.logger.debug("Using static bugzilla text")
-                page_text = self.static_bugzilla_text
-
+
+            if self.static_bugzilla_scraper is not None:
+                bz_bugtracker = self.static_bugzilla_scraper
             else:
-                try:
-                    self.logger.debug("...Fetching page")
-                    page_text = bz_bugtracker.getPage()
-                except HTTPError, error:
-                    self.logger.error("Error fetching %s: %s" % (
-                        lp_bugtracker.baseurl, error))
-                    continue
+                bz_bugtracker = BugzillaRemoteComponentScraper(
+                    base_url=lp_bugtracker.baseurl)
+
+            try:
+                self.logger.debug("...Fetching page")
+                page_text = bz_bugtracker.getPage()
+            except HTTPError, error:
+                self.logger.error("Error fetching %s: %s" % (
+                    lp_bugtracker.baseurl, error))
+                continue
+            except:
+                self.logger.error("Failed to access %s" % (
+                    lp_bugtracker.baseurl))
+                continue
 
             self.logger.debug("...Parsing html")
             bz_bugtracker.parsePage(page_text)

=== modified file 'lib/lp/bugs/tests/test_bzremotecomponentfinder.py'
--- lib/lp/bugs/tests/test_bzremotecomponentfinder.py	2011-06-02 22:27:11 +0000
+++ lib/lp/bugs/tests/test_bzremotecomponentfinder.py	2011-07-01 02:59:24 +0000
@@ -9,6 +9,7 @@
 
 import os
 import transaction
+from urllib2 import HTTPError
 
 from canonical.testing import DatabaseFunctionalLayer
 from canonical.launchpad.ftests import (
@@ -25,7 +26,6 @@
     ADMIN_EMAIL,
     )
 
-
 def read_test_file(name):
     """Return the contents of the test file named :name:
 
@@ -36,6 +36,31 @@
     return test_file.read()
 
 
+class StaticTextBugzillaRemoteComponentScraper(
+    BugzillaRemoteComponentScraper):
+    """A scraper that just returns static text for getPage()"""
+    def __init__(self):
+        BugzillaRemoteComponentScraper.__init__(
+            self, "http://www.example.com";)
+
+    def getPage(self):
+        return read_test_file("bugzilla-fdo-advanced-query.html")
+
+
+class FaultyBugzillaRemoteComponentScraper(
+    BugzillaRemoteComponentScraper):
+    """A scraper that trips asserts when getPage() is called"""
+
+    def __init__(self, error=None):
+        BugzillaRemoteComponentScraper.__init__(
+            self, "http://www.example.com";)
+        self.error = error
+
+    def getPage(self):
+        raise self.error
+        return None
+
+
 class TestBugzillaRemoteComponentScraper(TestCaseWithFactory):
 
     layer = DatabaseFunctionalLayer
@@ -97,6 +122,14 @@
         super(TestBugzillaRemoteComponentFinder, self).setUp()
         login(ADMIN_EMAIL)
 
+    def assertGetRemoteProductsAndComponentsDoesNotAssert(self, finder):
+        asserted = None
+        try:
+            finder.getRemoteProductsAndComponents()
+        except Exception as e:
+            asserted = e
+        self.assertIs(None, asserted)
+
     def test_store(self):
         """Check that already-parsed data gets stored to database"""
         lp_bugtracker = self.factory.makeBugTracker()
@@ -146,13 +179,11 @@
             title="fdo-example",
             name="fdo-example")
         transaction.commit()
-        BugzillaRemoteComponentScraper(
-            base_url="http://bugzilla.example.org";)
+        bz_scraper = StaticTextBugzillaRemoteComponentScraper()
 
-        page_text = read_test_file("bugzilla-fdo-advanced-query.html")
         finder = BugzillaRemoteComponentFinder(
             logger=BufferLogger(),
-            static_bugzilla_text=page_text)
+            static_bugzilla_scraper=bz_scraper)
         finder.getRemoteProductsAndComponents(bugtracker_name="fdo-example")
 
         self.assertEqual(
@@ -164,6 +195,50 @@
         self.assertIsNot(None, comp)
         self.assertEqual(u'Driver/Radeon', comp.name)
 
+    def test_get_remote_products_and_components_encounters_301(self):
+        lp_bugtracker = self.factory.makeBugTracker()
+        transaction.commit()
+        bz_scraper = FaultyBugzillaRemoteComponentScraper(
+            error=HTTPError("http://bugzilla.example.com";,
+                            301, 'Moved Permanently', {}, None))
+        finder = BugzillaRemoteComponentFinder(
+            logger=BufferLogger(), static_bugzilla_scraper=bz_scraper)
+
+        self.assertGetRemoteProductsAndComponentsDoesNotAssert(finder)
+
+    def test_get_remote_products_and_components_encounters_400(self):
+        lp_bugtracker = self.factory.makeBugTracker()
+        transaction.commit()
+        bz_scraper = FaultyBugzillaRemoteComponentScraper(
+            error=HTTPError("http://bugzilla.example.com";,
+                            400, 'Bad Request', {}, None))
+        finder = BugzillaRemoteComponentFinder(
+            logger=BufferLogger(), static_bugzilla_scraper=bz_scraper)
+
+        self.assertGetRemoteProductsAndComponentsDoesNotAssert(finder)
+
+    def test_get_remote_products_and_components_encounters_404(self):
+        lp_bugtracker = self.factory.makeBugTracker()
+        transaction.commit()
+        bz_scraper = FaultyBugzillaRemoteComponentScraper(
+            error=HTTPError("http://bugzilla.example.com";,
+                            404, 'Not Found', {}, None))
+        finder = BugzillaRemoteComponentFinder(
+            logger=BufferLogger(), static_bugzilla_scraper=bz_scraper)
+
+        self.assertGetRemoteProductsAndComponentsDoesNotAssert(finder)
+
+    def test_get_remote_products_and_components_encounters_500(self):
+        lp_bugtracker = self.factory.makeBugTracker()
+        transaction.commit()
+        bz_scraper = FaultyBugzillaRemoteComponentScraper(
+            error=HTTPError("http://bugzilla.example.com";,
+                            500, 'Internal Server Error', {}, None))
+        finder = BugzillaRemoteComponentFinder(
+            logger=BufferLogger(), static_bugzilla_scraper=bz_scraper)
+
+        self.assertGetRemoteProductsAndComponentsDoesNotAssert(finder)
+
 # FIXME: This takes ~9 sec to run, but mars says new testsuites need to
 #        compete in 2
 #    def test_cronjob(self):