apport-hackers team mailing list archive

Thread
Date

[Merge] lp:~brian-murray/apport/zgrep-fallback into lp:apport

To: mp+310218@xxxxxxxxxxxxxxxxxx
From: Brian Murray <brian@xxxxxxxxxx>
Date: Mon, 07 Nov 2016 18:30:40 -0000
Reply-to: mp+310218@xxxxxxxxxxxxxxxxxx
Sender: bounces@xxxxxxxxxxxxx

Brian Murray has proposed merging lp:~brian-murray/apport/zgrep-fallback into lp:apport.

Requested reviews:
  Apport upstream developers (apport-hackers)

For more details, see:
https://code.launchpad.net/~brian-murray/apport/zgrep-fallback/+merge/310218

The production retracers for the Error Tracker were OOM'ing regularly when trying to use zgrep to search Contents.gz for files found in the crash report.  While zgrep is faster than using gzip and reading the file line by line this still seems like a good fallback option and is better than having the retrace process crash, I've implemented the proposed change in the production version of the Error Tracker and have encountered no issues with it.

As mentioned this could likely be better:

+                        try:
+                            line = line.decode('UTF-8').rstrip('\n')
+                        # 2016-11-01 this should be better
+                        except UnicodeDecodeError:
+                            continue

I added because of the following lines in Contents.gz for yakkety:

 $ zgrep -a "lenska.alias" /mnt/storage/archive-mirror/dists/yakkety/Contents-amd64.gz
usr/lib/aspell/�slenska.alias                               universe/text/aspell-is

Thanks!
-- 
Your team Apport upstream developers is requested to review the proposed merge of lp:~brian-murray/apport/zgrep-fallback into lp:apport.

=== modified file 'backends/packaging-apt-dpkg.py'
--- backends/packaging-apt-dpkg.py	2016-08-13 07:09:38 +0000
+++ backends/packaging-apt-dpkg.py	2016-11-07 18:30:14 +0000
@@ -13,6 +13,7 @@
 # the full text of the license.
 
 import subprocess, os, glob, stat, sys, tempfile, shutil, time
+import errno
 import hashlib
 import json
 
@@ -1221,9 +1222,25 @@
 
             # zgrep is magnitudes faster than a 'gzip.open/split() loop'
             package = None
-            zgrep = subprocess.Popen(['zgrep', '-m1', '^%s[[:space:]]' % file, map],
-                                     stdout=subprocess.PIPE, stderr=subprocess.PIPE)
-            out = zgrep.communicate()[0].decode('UTF-8')
+            try:
+                zgrep = subprocess.Popen(['zgrep', '-m1', '^%s[[:space:]]' % file, map],
+                                         stdout=subprocess.PIPE, stderr=subprocess.PIPE)
+                out = zgrep.communicate()[0].decode('UTF-8')
+            except OSError as e:
+                if e.errno != errno.ENOMEM:
+                    raise
+                import gzip
+                with gzip.open('%s' % map, 'rb') as contents:
+                    out = ''
+                    for line in contents:
+                        try:
+                            line = line.decode('UTF-8').rstrip('\n')
+                        # 2016-11-01 this should be better
+                        except UnicodeDecodeError:
+                            continue
+                        if line.startswith(file):
+                            out = line
+                            break
             # we do not check the return code, since zgrep -m1 often errors out
             # with 'stdout: broken pipe'
             if out:

Follow ups

Re: [Merge] lp:~brian-murray/apport/zgrep-fallback into lp:apport
From: Brian Murray, 2016-11-08
Re: [Merge] lp:~brian-murray/apport/zgrep-fallback into lp:apport
From: Martin Pitt, 2016-11-08