← Back to team overview

launchpad-reviewers team mailing list archive

lp:~adeuring/launchpad/fix-broken-comment-nodes-in-hwdb-reports into lp:launchpad

 

Abel Deuring has proposed merging lp:~adeuring/launchpad/fix-broken-comment-nodes-in-hwdb-reports into lp:launchpad.

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers)

For more details, see:
https://code.launchpad.net/~adeuring/launchpad/fix-broken-comment-nodes-in-hwdb-reports/+merge/73552

This branch allows us to process a considerable number of broken HWDB reporte. It turnned out that many reports from Lucid (perhaps maverick too) contain ESC characters, which cElementTree does not like.

This branch takes a trival approach to process them nevertheless: Simply replace the content of each <comment> node with <comment/>. We don't use that data anyway.

There was a test that a <comment> node may not contain any sub-nodes. That test fails with this change, because all content of this node is deleted before the validator runs, so I simply deleted the test -- it is now pointless.

test: ./bin/test hardwaredb -vvt lp.hardwaredb.tests.test_hwdb_submission_validation

There are lots of lint complaints (tech debt...), but I am too tired to fix them now...

-- 
https://code.launchpad.net/~adeuring/launchpad/fix-broken-comment-nodes-in-hwdb-reports/+merge/73552
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~adeuring/launchpad/fix-broken-comment-nodes-in-hwdb-reports into lp:launchpad.
=== modified file 'lib/lp/hardwaredb/scripts/hwdbsubmissions.py'
--- lib/lp/hardwaredb/scripts/hwdbsubmissions.py	2011-08-29 20:44:36 +0000
+++ lib/lp/hardwaredb/scripts/hwdbsubmissions.py	2011-08-31 16:11:25 +0000
@@ -71,6 +71,8 @@
     """,
     re.VERBOSE)
 
+_broken_comment_nodes_re = re.compile('(<comment>.*?</comment>)', re.DOTALL)
+
 ROOT_UDI = '/org/freedesktop/Hal/devices/computer'
 UDEV_ROOT_PATH = '/devices/LNXSYSTM:00'
 
@@ -158,12 +160,21 @@
             self.logger.warning(
                 'Parsing submission %s: %s' % (self.submission_key, message))
 
+    def fix_frequent_errors(self, submission):
+        """Fixes for frequent formal errors in the submissions.
+        """
+        # A considerable number of reports for Lucid has ESC characters
+        # in comment nodes. We don't need the comment nodes at all, so
+        # we can simply empty them.
+        return _broken_comment_nodes_re.sub('<comment/>', submission)
+
     def _getValidatedEtree(self, submission, submission_key):
         """Create an etree doc from the XML string submission and validate it.
 
         :return: an `lxml.etree` instance representation of a valid
             submission or None for invalid submissions.
         """
+        submission = self.fix_frequent_errors(submission)
         try:
             tree = etree.parse(StringIO(submission), parser=self.doc_parser)
         except SyntaxError, error_value:

=== modified file 'lib/lp/hardwaredb/tests/test_hwdb_submission_validation.py'
--- lib/lp/hardwaredb/tests/test_hwdb_submission_validation.py	2011-08-29 20:11:27 +0000
+++ lib/lp/hardwaredb/tests/test_hwdb_submission_validation.py	2011-08-31 16:11:25 +0000
@@ -99,6 +99,20 @@
         self.assertEqual(result, None,
                          'Invalid root node not detected')
 
+    def testBadDataInCommentNode(self):
+        """Many submissions contain ESC symbols in <comment> nodes.
+
+        The cElementTree parser does not accept this; The processing
+        script deals with this by emptying all <comment> nodes before
+        building the element tree. (Note that we don't process data
+        from this node at all.)
+        """
+        bad_comment_node = "\n<comment>\x1b</comment>"
+        sample_data = self.replaceSampledata(
+            self.sample_data, bad_comment_node, "<comment>", "</comment>")
+        validated, submission_id = self.runValidator(sample_data)
+        self.assertTrue(validated is not None)
+
     def _getLastOopsTime(self):
         try:
             last_oops_time = globalErrorUtility.getLastOopsReport().time
@@ -120,7 +134,8 @@
             after=True)
         # Add the OopsHandler to the log, because we want to make sure this
         # doesn't create an Oops report.
-        logging.getLogger('test_hwdb_submission_parser').addHandler(OopsHandler(self.log.name))
+        logging.getLogger('test_hwdb_submission_parser').addHandler(
+            OopsHandler(self.log.name))
         result, submission_id = self.runValidator(sample_data)
         last_oops_time = self._getLastOopsTime()
         # We use the class method here, because it's been overrided for the
@@ -2695,26 +2710,6 @@
             'Element driver has extra content: nonsense',
             'detection of invalid sub-tag <nonsense> of <driver> in <target>')
 
-    def testCommentTag(self):
-        """Validation of the <comment> tag."""
-        # This tag has no attributes.
-        sample_data = self.sample_data.replace(
-            '<comment>', '<comment foo="bar">')
-        result, submission_id = self.runValidator(sample_data)
-        self.assertErrorMessage(
-            submission_id, result,
-            'Invalid attribute foo for element comment',
-            'detection of invalid attribute of <comment>')
-
-        # Sub-tags are not allowed.
-        sample_data = self.sample_data.replace(
-            '<comment>', '<comment><nonsense/>')
-        result, submission_id = self.runValidator(sample_data)
-        self.assertErrorMessage(
-            submission_id, result,
-            'Element comment has extra content: nonsense',
-            'detection of invalid sub-tag <nonsense> of <comment>')
-
     def testMissingContextNode(self):
         """Validation of the <context> node."""
         # The default sample data contains this node. It is not a