← Back to team overview

launchpad-reviewers team mailing list archive

[Merge] lp:~adeuring/launchpad/cronscript-for-hwdb-report-fixes into lp:launchpad

 

Abel Deuring has proposed merging lp:~adeuring/launchpad/cronscript-for-hwdb-report-fixes into lp:launchpad.

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers)

For more details, see:
https://code.launchpad.net/~adeuring/launchpad/cronscript-for-hwdb-report-fixes/+merge/75214

This branch converts scripts/reprocess-hwdb-submissions.py into cronscripts/reprocess-hwdb-submissions.py.

We need to reprocess a larger number of reports to the hardware database, coming from checkbox in Lucid, Marvierick and Natty, which could not be processed due to, well... a number of reasons. The core fixes which allow to process these submissions are already merged, but we want to include data from those reports that are marked as bad.

The number of these reports is in the order of 10**5 -- more than should be done in a single script run. The idea of the script simple: Run a query like

  SELECT ... FROM HWSubmission WHERE id > (variable number) AND status=(bad) LIMIT whatever;

in a TunableLoop: The first run should begin with the first bad report that could come from Lucid; it tries to process a number of reports and records the ID of the last touched report. In the next run, it continues with higher IDs.

(variable number) above is stored in a file; when the script starts, it reads the value from the file; when it finishes, it writes a new value into the same file.

The test test_run_reprocessing_script_two_batches() turned out to be quite useful. It reveealed that the lines removed below:

@@ -3113,8 +3112,6 @@
         submissions = removeSecurityProxy(submissions).find(
             HWSubmission.id >= self.start)
         submissions = list(submissions[:chunk_size])
-        if len(submissions) > 0:
-            self.start = submissions[-1].id + 1
         return submissions

were nonsense: The method __call__() can terminate before all submissions from a chunk have been processed: When the number of already processed submissions goes beyond self.max_submissions:

             if self.max_submissions is not None:
                 if self.max_submissions <= (
                     self.valid_submissions + self.invalid_submissions):
                     self.finished = True
                     break

The new tests look a bit odd: All except one are disabled. I described the reason in bug 849056...

test: ./bin/test hardwaredb -vvt lp.hardwaredb.scripts.tests.test_hwdbsubmissions

I removed a bit of lint from lib/lp/hardwaredb/scripts/hwdbsubmissions.py but some more remains:

= Launchpad lint =

Checking for conflicts and issues in changed files.

Linting changed files:
  cronscripts/reprocess-hwdb-submissions.py
  lib/lp/hardwaredb/scripts/hwdbsubmissions.py
  lib/lp/hardwaredb/scripts/tests/test_hwdbsubmissions.py

./cronscripts/reprocess-hwdb-submissions.py
      27: '_pythonpath' imported but unused
./lib/lp/hardwaredb/scripts/hwdbsubmissions.py
      26: redefinition of unused 'etree' from line 24
    3070: local variable 'error' is assigned to but never used
    1696: E261 at least two spaces before inline comment
./lib/lp/hardwaredb/scripts/tests/test_hwdbsubmissions.py
     159: Line exceeds 78 characters.

-- 
https://code.launchpad.net/~adeuring/launchpad/cronscript-for-hwdb-report-fixes/+merge/75214
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~adeuring/launchpad/cronscript-for-hwdb-report-fixes into lp:launchpad.
=== renamed file 'scripts/reprocess-hwdb-submissions.py' => 'cronscripts/reprocess-hwdb-submissions.py'
--- scripts/reprocess-hwdb-submissions.py	2011-09-01 15:10:31 +0000
+++ cronscripts/reprocess-hwdb-submissions.py	2011-09-13 15:54:31 +0000
@@ -14,21 +14,24 @@
         which will be processed.
 
 This script iterates over the HWDB submissions with the status
-SUBMITTED, beginning with the oldest submissions, populate the
-HWDB tables with the data from these submissions.
+INVALID. It processes only submissions with an ID greater or equal
+than the number specified by the file given a option -s.
+
+When the script terminates, it writes the ID of the last processed
+submission into this file.
 
 Properly processed submissions are set to the status PROCESSED;
-submissions that cannot be processed are set to the status INVALID.
+submissions that cannot be processed retain the status INVALID.
 """
 
 import _pythonpath
 
-from lp.services.scripts.base import LaunchpadScript
+from lp.services.scripts.base import LaunchpadCronScript
 from lp.hardwaredb.scripts.hwdbsubmissions import (
     reprocess_invalid_submissions)
 
 
-class HWDBSubmissionProcessor(LaunchpadScript):
+class HWDBSubmissionProcessor(LaunchpadCronScript):
 
     def add_my_options(self):
         """See `LaunchpadScript`."""
@@ -39,9 +42,10 @@
             '-w', '--warnings', action="store_true", default=False,
             help='Include warnings.')
         self.parser.add_option(
-            '-s', '--start',
-            help=('Process HWSubmission records having an id greater or '
-                  'equal than this value.'))
+            '-s', '--start-file', default=None,
+            help=('The name of a file storing the smallest ID of a\n'
+                  'hardware database submission that should be processed.\n'
+                  'This script must have read and write access to the file.'))
 
     def main(self):
         max_submissions = self.options.max_submissions
@@ -57,22 +61,38 @@
                 self.logger.error(
                     '--max_submissions must be a positive integer.')
                 return
-        if self.options.start is None:
-            self.logger.error('Option --start not specified.')
-            return
-        try:
-            start = int(self.options.start)
+
+        if self.options.start_file is None:
+            self.logger.error('Option --start-file not specified.')
+            return
+        try:
+            start_file = open(self.options.start_file, 'r+')
+            start_id = start_file.read().strip()
+        except IOError, error:
+            self.logger.error(
+                'Cannot access file %s: %s' % (
+                    self.options.start_file, error))
+            return
+        try:
+            start_id = int(start_id)
         except ValueError:
-            self.logger.error('Option --start must have an integer value.')
+            self.logger.error(
+                '%s must contain only an integer' % self.options.start_file)
             return
-        if start < 0:
-            self.logger.error('--start must be a positive integer.')
+        if start_id < 0:
+            self.logger.error(
+                '%s must contain a positive integer'
+                % self.options.start_file)
             return
 
-        reprocess_invalid_submissions(
-            start, self.txn, self.logger,
+        next_start = reprocess_invalid_submissions(
+            start_id, self.txn, self.logger,
             max_submissions, self.options.warnings)
 
+        start_file.seek(0)
+        start_file.write('%i' % next_start)
+        start_file.close()
+
 if __name__ == '__main__':
     script = HWDBSubmissionProcessor(
         'hwdbsubmissions', dbuser='hwdb-submission-processor')

=== modified file 'lib/lp/hardwaredb/scripts/hwdbsubmissions.py'
--- lib/lp/hardwaredb/scripts/hwdbsubmissions.py	2011-09-05 12:07:35 +0000
+++ lib/lp/hardwaredb/scripts/hwdbsubmissions.py	2011-09-13 15:54:31 +0000
@@ -129,6 +129,7 @@
 UDEV_USB_TYPE_RE = re.compile('^[0-9]{1,3}/[0-9]{1,3}/[0-9]{1,3}$')
 SYSFS_SCSI_DEVICE_ATTRIBUTES = set(('vendor', 'model', 'type'))
 
+
 class SubmissionParser(object):
     """A Parser for the submissions to the hardware database."""
 
@@ -405,7 +406,6 @@
 
         :return: (name, (value, type)) of a property.
         """
-        property_name = property_node.get('name')
         return (property_node.get('name'),
                 self._getValueAndType(property_node))
 
@@ -1004,7 +1004,7 @@
                  the content.
         """
         self.submission_key = submission_key
-        submission_doc  = self._getValidatedEtree(submission, submission_key)
+        submission_doc = self._getValidatedEtree(submission, submission_key)
         if submission_doc is None:
             return None
 
@@ -1565,7 +1565,7 @@
                     'Invalid device path name: %r' % path_name,
                     self.submission_key)
                 return False
-            for parent_path in path_names[path_index+1:]:
+            for parent_path in path_names[path_index + 1:]:
                 if path_name.startswith(parent_path):
                     self.devices[parent_path].addChild(
                         self.devices[path_name])
@@ -2822,7 +2822,6 @@
             # SubmissionParser.checkUdevScsiProperties() ensures that
             # each SCSI device has a record in self.sysfs and that
             # the attribute 'vendor' exists.
-            path = self.udev['P']
             return self.sysfs['vendor']
         else:
             return None
@@ -2834,7 +2833,6 @@
             # SubmissionParser.checkUdevScsiProperties() ensures that
             # each SCSI device has a record in self.sysfs and that
             # the attribute 'model' exists.
-            path = self.udev['P']
             return self.sysfs['model']
         else:
             return None
@@ -3080,6 +3078,7 @@
                 # further submissions in this batch raise an exception.
                 self.transaction.commit()
 
+            self.start = submission.id + 1
             if self.max_submissions is not None:
                 if self.max_submissions <= (
                     self.valid_submissions + self.invalid_submissions):
@@ -3113,8 +3112,6 @@
         submissions = removeSecurityProxy(submissions).find(
             HWSubmission.id >= self.start)
         submissions = list(submissions[:chunk_size])
-        if len(submissions) > 0:
-            self.start = submissions[-1].id + 1
         return submissions
 
 
@@ -3139,6 +3136,7 @@
         'Processed %i valid and %i invalid HWDB submissions'
         % (loop.valid_submissions, loop.invalid_submissions))
 
+
 def reprocess_invalid_submissions(start, transaction, logger,
                                   max_submissions=None, record_warnings=True):
     """Reprocess invalid submissions.
@@ -3160,3 +3158,4 @@
         'Processed %i valid and %i invalid HWDB submissions'
         % (loop.valid_submissions, loop.invalid_submissions))
     logger.info('last processed: %i' % loop.start)
+    return loop.start

=== modified file 'lib/lp/hardwaredb/scripts/tests/test_hwdbsubmissions.py'
--- lib/lp/hardwaredb/scripts/tests/test_hwdbsubmissions.py	2011-09-01 16:43:49 +0000
+++ lib/lp/hardwaredb/scripts/tests/test_hwdbsubmissions.py	2011-09-13 15:54:31 +0000
@@ -5,7 +5,10 @@
 
 __metaclass__ = type
 
+from storm.store import Store
+from tempfile import mktemp
 
+from canonical.launchpad.ftests.script import run_script
 from canonical.testing.layers import LaunchpadScriptLayer
 from lp.hardwaredb.interfaces.hwdb import HWSubmissionProcessingStatus
 from lp.hardwaredb.scripts.hwdbsubmissions import (
@@ -13,6 +16,8 @@
     ProcessingLoopForReprocessingBadSubmissions,
     )
 from lp.testing import TestCaseWithFactory
+from lp.testing.matchers import Contains
+import transaction
 
 
 class TestProcessingLoops(TestCaseWithFactory):
@@ -102,7 +107,8 @@
         submissions = loop.getUnprocessedSubmissions(1)
         self.assertEqual(1, len(submissions))
 
-    def test_BadSubmissions_respects_start(self):
+    # Disabled due to bug 849056.
+    def xxx_test_BadSubmissions_respects_start(self):
         # It is possible to request a start id. Previous entries are ignored.
         submission1 = self.factory.makeHWSubmission(
             status=HWSubmissionProcessingStatus.INVALID)
@@ -113,3 +119,126 @@
         # The sample data already contains one submission.
         submissions = loop.getUnprocessedSubmissions(2)
         self.assertEqual([submission2], submissions)
+
+    # Disabled due to bug 849056.
+    def xxx_test_run_reprocessing_script_no_params(self):
+        # cronscripts/reprocess-hwdb-submissions.py needs at least the
+        # parameter --start-file
+        retcode, stdout, stderr = run_script(
+            'cronscripts/reprocess-hwdb-submissions.py', [])
+        self.assertThat(
+            stderr, Contains('Option --start-file not specified.'))
+
+    # Disabled due to bug 849056.
+    def xxx_test_run_reprocessing_script_startfile_does_not_exist(self):
+        # If the specified start file does not exist,
+        # cronscripts/reprocess-hwdb-submissions.py reports an error.
+        does_not_exist = mktemp()
+        retcode, stdout, stderr = run_script(
+            'cronscripts/reprocess-hwdb-submissions.py',
+            ['--start-file', does_not_exist])
+        self.assertThat(
+            stderr, Contains('Cannot access file %s' % does_not_exist))
+
+    # Disabled due to bug 849056.
+    def xxx_test_run_reprocessing_script_startfile_without_integer(self):
+        # If the specified start file contains any non-integer string,
+        # cronscripts/reprocess-hwdb-submissions.py reports an error.
+        start_file_name = mktemp()
+        start_file = open(start_file_name, 'w')
+        start_file.write('nonsense')
+        start_file.close()
+        retcode, stdout, stderr = run_script(
+            'cronscripts/reprocess-hwdb-submissions.py',
+            ['--start-file', start_file_name])
+        self.assertThat(
+            stderr,
+            Contains('%s must contain only an integer' % start_file_name))
+
+    # Disabled due to bug 849056.
+    def xxx_test_run_reprocessing_script_startfile_with_negative_integer(self):
+        # If the specified start file contains any non-integer string,
+        # cronscripts/reprocess-hwdb-submissions.py reports an error.
+        start_file_name = mktemp()
+        start_file = open(start_file_name, 'w')
+        start_file.write('-1')
+        start_file.close()
+        retcode, stdout, stderr = run_script(
+            'cronscripts/reprocess-hwdb-submissions.py',
+            ['--start-file', start_file_name])
+        self.assertThat(
+            stderr,
+            Contains('%s must contain a positive integer' % start_file_name))
+
+    # Disabled due to bug 849056.
+    def xxx_test_run_reprocessing_script_max_submission_not_integer(self):
+        # If the parameter --max-submissions is not an integer,
+        # cronscripts/reprocess-hwdb-submissions.py reports an error.
+        retcode, stdout, stderr = run_script(
+            'cronscripts/reprocess-hwdb-submissions.py',
+            ['--max-submissions', 'nonsense'])
+        expected = "Invalid value for --max_submissions specified: 'nonsense'"
+        self.assertThat(stderr, Contains(expected))
+
+    def test_run_reprocessing_script_two_batches(self):
+        # cronscripts/reprocess-hwdb-submissions.py begings to process
+        # submissions with IDs starting at the value stored in the
+        # file given as the parameter --start-file. When is has
+        # finished processing the number of submissions specified by
+        # --max-submissions, it stores the ID of the last prcessed
+        # submission in start-file.
+        new_submissions = []
+        for count in range(5):
+            new_submissions.append(
+                self.factory.makeHWSubmission(
+                    status=HWSubmissionProcessingStatus.INVALID))
+
+        start_file_name = mktemp()
+        start_file = open(start_file_name, 'w')
+        start_file.write('%i' % new_submissions[1].id)
+        start_file.close()
+        transaction.commit()
+        Store.of(new_submissions[0]).invalidate()
+
+        retcode, stdout, stderr = run_script(
+            'cronscripts/reprocess-hwdb-submissions.py',
+            ['--max-submissions', '2', '--start-file', start_file_name])
+
+        # We started with the ID of the second submission created abvoe,
+        # so the first submission still has the status INVALID.
+        self.assertEqual(
+            HWSubmissionProcessingStatus.INVALID,
+            new_submissions[0].status)
+        # We processed two submissions, they now have the status
+        # PROCESSED.
+        self.assertEqual(
+            HWSubmissionProcessingStatus.PROCESSED,
+            new_submissions[1].status)
+        self.assertEqual(
+            HWSubmissionProcessingStatus.PROCESSED,
+            new_submissions[2].status)
+        # The  following submissions were not yet touched,
+        self.assertEqual(
+            HWSubmissionProcessingStatus.INVALID,
+            new_submissions[3].status)
+        self.assertEqual(
+            HWSubmissionProcessingStatus.INVALID,
+            new_submissions[4].status)
+
+        # The start file now contains the ID of the 4th submission.
+        new_start = int(open(start_file_name).read())
+        self.assertEqual(new_submissions[3].id, new_start)
+
+        # When we run the script again, for only one submission,
+        # the 4th submission is processed.
+        transaction.abort()
+        Store.of(new_submissions[0]).invalidate()
+        retcode, stdout, stderr = run_script(
+            'cronscripts/reprocess-hwdb-submissions.py',
+            ['--max-submissions', '1', '--start-file', start_file_name])
+        self.assertEqual(
+            HWSubmissionProcessingStatus.PROCESSED,
+            new_submissions[3].status)
+        self.assertEqual(
+            HWSubmissionProcessingStatus.INVALID,
+            new_submissions[4].status)