← Back to team overview

launchpad-reviewers team mailing list archive

[Merge] lp:~wgrant/launchpad/bug-694001-apache-username-spaces into lp:launchpad

 

William Grant has proposed merging lp:~wgrant/launchpad/bug-694001-apache-username-spaces into lp:launchpad.

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers)
Related bugs:
  #694001 Apache log parser doesn't handle usernames with spaces
  https://bugs.launchpad.net/bugs/694001

For more details, see:
https://code.launchpad.net/~wgrant/launchpad/bug-694001-apache-username-spaces/+merge/46720

Private PPA Apache logs contain unquoted usernames, which can be whatever the user wants -- even containing spaces or other strange characters. Since the default Apache combined log format uses spaces to delimit fields, the parser grabs fields with \S+ unless they are quoted. This makes it choke on lines with usernames containing spaces.

This branch fixes contrib.apachelog to match usernames with spaces. It's the only field in the default log format that can contain spaces, so it's still deterministically parsable.
-- 
https://code.launchpad.net/~wgrant/launchpad/bug-694001-apache-username-spaces/+merge/46720
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~wgrant/launchpad/bug-694001-apache-username-spaces into lp:launchpad.
=== modified file 'lib/contrib/apachelog.py'
--- lib/contrib/apachelog.py	2009-04-29 19:10:17 +0000
+++ lib/contrib/apachelog.py	2011-01-19 00:07:49 +0000
@@ -159,7 +159,7 @@
             elif findpercent.search(element):
                 subpattern = r'(\[[^\]]+\])'
                 
-            elif element == '%U':
+            elif element in ('%U', '%u'):
                 subpattern = '(.+?)'
             
             subpatterns.append(subpattern)

=== modified file 'lib/lp/services/apachelogparser/tests/test_apachelogparser.py'
--- lib/lp/services/apachelogparser/tests/test_apachelogparser.py	2011-01-05 04:56:11 +0000
+++ lib/lp/services/apachelogparser/tests/test_apachelogparser.py	2011-01-19 00:07:49 +0000
@@ -68,6 +68,22 @@
         self.assertEqual(
             request, 'GET /10133748/cramfsswap_1.4.1.tar.gz HTTP/1.0')
 
+    def test_parsing_line_with_spaces_in_username(self):
+        # Some lines have spaces in the username, left unquoted by
+        # Apache. They can still be parsed OK, since no other fields
+        # have similar issues.
+        line = (r'1.1.1.1 - Some User [25/Jan/2009:15:48:07 +0000] "GET '
+                r'/10133748/cramfsswap_1.4.1.tar.gz HTTP/1.0" 200 12341 '
+                r'"http://foo.bar/?baz=\"bang\""; '
+                r'"\"Nokia2630/2.0 (05.20) Profile/MIDP-2.1 '
+                r'Configuration/CLDC-1.1\""')
+        host, date, status, request = get_host_date_status_and_request(line)
+        self.assertEqual(host, '1.1.1.1')
+        self.assertEqual(date, '[25/Jan/2009:15:48:07 +0000]')
+        self.assertEqual(status, '200')
+        self.assertEqual(
+            request, 'GET /10133748/cramfsswap_1.4.1.tar.gz HTTP/1.0')
+
     def test_day_extraction(self):
         date = '[13/Jun/2008:18:38:57 +0100]'
         self.assertEqual(get_day(date), datetime(2008, 6, 13))