launchpad-reviewers team mailing list archive
-
launchpad-reviewers team
-
Mailing list archive
-
Message #04835
lp:~adeuring/launchpad/fix-broken-comment-nodes-in-hwdb-reports into lp:launchpad
Abel Deuring has proposed merging lp:~adeuring/launchpad/fix-broken-comment-nodes-in-hwdb-reports into lp:launchpad.
Requested reviews:
Launchpad code reviewers (launchpad-reviewers)
For more details, see:
https://code.launchpad.net/~adeuring/launchpad/fix-broken-comment-nodes-in-hwdb-reports/+merge/73552
This branch allows us to process a considerable number of broken HWDB reporte. It turnned out that many reports from Lucid (perhaps maverick too) contain ESC characters, which cElementTree does not like.
This branch takes a trival approach to process them nevertheless: Simply replace the content of each <comment> node with <comment/>. We don't use that data anyway.
There was a test that a <comment> node may not contain any sub-nodes. That test fails with this change, because all content of this node is deleted before the validator runs, so I simply deleted the test -- it is now pointless.
test: ./bin/test hardwaredb -vvt lp.hardwaredb.tests.test_hwdb_submission_validation
There are lots of lint complaints (tech debt...), but I am too tired to fix them now...
--
https://code.launchpad.net/~adeuring/launchpad/fix-broken-comment-nodes-in-hwdb-reports/+merge/73552
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~adeuring/launchpad/fix-broken-comment-nodes-in-hwdb-reports into lp:launchpad.
=== modified file 'lib/lp/hardwaredb/scripts/hwdbsubmissions.py'
--- lib/lp/hardwaredb/scripts/hwdbsubmissions.py 2011-08-29 20:44:36 +0000
+++ lib/lp/hardwaredb/scripts/hwdbsubmissions.py 2011-08-31 16:11:25 +0000
@@ -71,6 +71,8 @@
""",
re.VERBOSE)
+_broken_comment_nodes_re = re.compile('(<comment>.*?</comment>)', re.DOTALL)
+
ROOT_UDI = '/org/freedesktop/Hal/devices/computer'
UDEV_ROOT_PATH = '/devices/LNXSYSTM:00'
@@ -158,12 +160,21 @@
self.logger.warning(
'Parsing submission %s: %s' % (self.submission_key, message))
+ def fix_frequent_errors(self, submission):
+ """Fixes for frequent formal errors in the submissions.
+ """
+ # A considerable number of reports for Lucid has ESC characters
+ # in comment nodes. We don't need the comment nodes at all, so
+ # we can simply empty them.
+ return _broken_comment_nodes_re.sub('<comment/>', submission)
+
def _getValidatedEtree(self, submission, submission_key):
"""Create an etree doc from the XML string submission and validate it.
:return: an `lxml.etree` instance representation of a valid
submission or None for invalid submissions.
"""
+ submission = self.fix_frequent_errors(submission)
try:
tree = etree.parse(StringIO(submission), parser=self.doc_parser)
except SyntaxError, error_value:
=== modified file 'lib/lp/hardwaredb/tests/test_hwdb_submission_validation.py'
--- lib/lp/hardwaredb/tests/test_hwdb_submission_validation.py 2011-08-29 20:11:27 +0000
+++ lib/lp/hardwaredb/tests/test_hwdb_submission_validation.py 2011-08-31 16:11:25 +0000
@@ -99,6 +99,20 @@
self.assertEqual(result, None,
'Invalid root node not detected')
+ def testBadDataInCommentNode(self):
+ """Many submissions contain ESC symbols in <comment> nodes.
+
+ The cElementTree parser does not accept this; The processing
+ script deals with this by emptying all <comment> nodes before
+ building the element tree. (Note that we don't process data
+ from this node at all.)
+ """
+ bad_comment_node = "\n<comment>\x1b</comment>"
+ sample_data = self.replaceSampledata(
+ self.sample_data, bad_comment_node, "<comment>", "</comment>")
+ validated, submission_id = self.runValidator(sample_data)
+ self.assertTrue(validated is not None)
+
def _getLastOopsTime(self):
try:
last_oops_time = globalErrorUtility.getLastOopsReport().time
@@ -120,7 +134,8 @@
after=True)
# Add the OopsHandler to the log, because we want to make sure this
# doesn't create an Oops report.
- logging.getLogger('test_hwdb_submission_parser').addHandler(OopsHandler(self.log.name))
+ logging.getLogger('test_hwdb_submission_parser').addHandler(
+ OopsHandler(self.log.name))
result, submission_id = self.runValidator(sample_data)
last_oops_time = self._getLastOopsTime()
# We use the class method here, because it's been overrided for the
@@ -2695,26 +2710,6 @@
'Element driver has extra content: nonsense',
'detection of invalid sub-tag <nonsense> of <driver> in <target>')
- def testCommentTag(self):
- """Validation of the <comment> tag."""
- # This tag has no attributes.
- sample_data = self.sample_data.replace(
- '<comment>', '<comment foo="bar">')
- result, submission_id = self.runValidator(sample_data)
- self.assertErrorMessage(
- submission_id, result,
- 'Invalid attribute foo for element comment',
- 'detection of invalid attribute of <comment>')
-
- # Sub-tags are not allowed.
- sample_data = self.sample_data.replace(
- '<comment>', '<comment><nonsense/>')
- result, submission_id = self.runValidator(sample_data)
- self.assertErrorMessage(
- submission_id, result,
- 'Element comment has extra content: nonsense',
- 'detection of invalid sub-tag <nonsense> of <comment>')
-
def testMissingContextNode(self):
"""Validation of the <context> node."""
# The default sample data contains this node. It is not a