← Back to team overview

launchpad-reviewers team mailing list archive

[Merge] lp:~wgrant/launchpad/bug-702819-bad-uris into lp:launchpad

 

William Grant has proposed merging lp:~wgrant/launchpad/bug-702819-bad-uris into lp:launchpad.

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers)
Related bugs:
  #702819 Log parser should skip lines raising InvalidURIError
  https://bugs.launchpad.net/bugs/702819

For more details, see:
https://code.launchpad.net/~wgrant/launchpad/bug-702819-bad-uris/+merge/46683

Upon seeing an unparsable URI, the Apache log parser currently skips the remainder of the file. This branch fixes it to just treat an unparsable URL as a path, causing the specific parser implementation to skip the line and continue.
-- 
https://code.launchpad.net/~wgrant/launchpad/bug-702819-bad-uris/+merge/46683
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~wgrant/launchpad/bug-702819-bad-uris into lp:launchpad.
=== modified file 'lib/lp/services/apachelogparser/base.py'
--- lib/lp/services/apachelogparser/base.py	2011-01-05 04:56:11 +0000
+++ lib/lp/services/apachelogparser/base.py	2011-01-18 21:41:57 +0000
@@ -6,7 +6,7 @@
 import os
 
 from contrib import apachelog
-from lazr.uri import URI
+from lazr.uri import InvalidURIError, URI
 import pytz
 from zope.component import getUtility
 
@@ -218,6 +218,12 @@
         # This is the common case.
         path = first
     if path.startswith('http://') or path.startswith('https://'):
-        uri = URI(path)
-        path = uri.path
+        try:
+            uri = URI(path)
+            path = uri.path
+        except InvalidURIError:
+            # The URL is not valid, so we can't extract a path. Let it
+            # pass through, where it will probably be skipped as
+            # unparsable.
+            pass
     return method, path

=== modified file 'lib/lp/services/apachelogparser/tests/test_apachelogparser.py'
--- lib/lp/services/apachelogparser/tests/test_apachelogparser.py	2011-01-05 04:56:11 +0000
+++ lib/lp/services/apachelogparser/tests/test_apachelogparser.py	2011-01-18 21:41:57 +0000
@@ -101,6 +101,15 @@
             path,
             r'/56222647/deluge-gtk_1.3.0-0ubuntu1_all.deb?N\x1f\x9b Z%7B...')
 
+    def test_parsing_invalid_url(self):
+        # See bug 702819.
+        request = r'GET http://blah/1234/fewfwfw GET http://blah HTTP/1.0'
+        method, path = get_method_and_path(request)
+        self.assertEqual(method, 'GET')
+        self.assertEqual(
+            path,
+            r'http://blah/1234/fewfwfw GET http://blah')
+
 
 class Test_get_fd_and_file_size(TestCase):