launchpad-reviewers team mailing list archive
-
launchpad-reviewers team
-
Mailing list archive
-
Message #02030
[Merge] lp:~julian-edwards/launchpad/log-parser-bug-680463 into lp:launchpad
Julian Edwards has proposed merging lp:~julian-edwards/launchpad/log-parser-bug-680463 into lp:launchpad.
Requested reviews:
Launchpad code reviewers (launchpad-reviewers)
Related bugs:
#680463 Apache log parser crashes out on large gzip files
https://bugs.launchpad.net/bugs/680463
Figure out the length of gzip log files without having to read them in to memory.
The existing code tries to read the uncompressed contents of a gzip file into memory in their entirety. This makes the PPA log parser blow up quite horribly as the log files are very large.
Use existing test with:
bin/test -cvv test_apachelogparser Test_get_fd_and_file_size
QA Plan
-------
I have got a copy of the production log files that cause the crash on dogfood. Running with the fix allows the processing to continue with no increased memory usage as observed in "top".
--
https://code.launchpad.net/~julian-edwards/launchpad/log-parser-bug-680463/+merge/41865
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~julian-edwards/launchpad/log-parser-bug-680463 into lp:launchpad.
=== modified file 'lib/lp/services/apachelogparser/base.py'
--- lib/lp/services/apachelogparser/base.py 2010-11-17 23:20:07 +0000
+++ lib/lp/services/apachelogparser/base.py 2010-11-25 14:21:23 +0000
@@ -64,13 +64,14 @@
file_path points to a gzipped file.
"""
if file_path.endswith('.gz'):
+ # The last 4 bytes of the file contains the uncompressed file's
+ # size, modulo 2**32. This code is somewhat stolen from the gzip
+ # module in Python 2.6.
fd = gzip.open(file_path)
- # There doesn't seem to be a better way of figuring out the
- # uncompressed size of a file, so we'll read the whole file here.
- file_size = len(fd.read())
- # Seek back to the beginning of the file as if we had just opened
- # it.
- fd.seek(0)
+ fd.fileobj.seek(-4, os.SEEK_END)
+ isize = gzip.read32(fd.fileobj) # may exceed 2GB
+ file_size = isize & 0xffffffffL
+ fd.fileobj.seek(0)
else:
fd = open(file_path)
file_size = os.path.getsize(file_path)