apport-hackers team mailing list archive

Thread
Date

Re: [Merge] lp:~brian-murray/apport/zgrep-fallback into lp:apport

To: Brian Murray <brian@xxxxxxxxxx>
From: Martin Pitt <martin.pitt@xxxxxxxxxx>
Date: Tue, 08 Nov 2016 10:47:54 -0000
In-reply-to: <20161107183039.32178.44759.launchpad@ackee.canonical.com>
Reply-to: mp+310218@xxxxxxxxxxxxxxxxxx
Sender: bounces@xxxxxxxxxxxxx

I can't say that I like having two code paths for the same thing, but I accept that it's necessary. (Still, if these machines are that tight on RAM, can we increase that a bit?)

I would like this to be a bit faster and simpler though, see inline comment. Thanks!

Diff comments:

> === modified file 'backends/packaging-apt-dpkg.py'
> --- backends/packaging-apt-dpkg.py	2016-08-13 07:09:38 +0000
> +++ backends/packaging-apt-dpkg.py	2016-11-07 18:30:14 +0000
> @@ -1221,9 +1222,25 @@
>  
>              # zgrep is magnitudes faster than a 'gzip.open/split() loop'
>              package = None
> -            zgrep = subprocess.Popen(['zgrep', '-m1', '^%s[[:space:]]' % file, map],
> -                                     stdout=subprocess.PIPE, stderr=subprocess.PIPE)
> -            out = zgrep.communicate()[0].decode('UTF-8')
> +            try:
> +                zgrep = subprocess.Popen(['zgrep', '-m1', '^%s[[:space:]]' % file, map],
> +                                         stdout=subprocess.PIPE, stderr=subprocess.PIPE)
> +                out = zgrep.communicate()[0].decode('UTF-8')
> +            except OSError as e:
> +                if e.errno != errno.ENOMEM:
> +                    raise
> +                import gzip
> +                with gzip.open('%s' % map, 'rb') as contents:
> +                    out = ''
> +                    for line in contents:
> +                        try:
> +                            line = line.decode('UTF-8').rstrip('\n')
> +                        # 2016-11-01 this should be better
> +                        except UnicodeDecodeError:
> +                            continue

The decode() makes things slower, and this exception handling is ugly -- I'd rather recommend to do "file_b = file.encode()" and do the startswith() test for file_b. This works fine on bytes, doesn't need decoding, and is simpler and faster.

> +                        if line.startswith(file):
> +                            out = line
> +                            break
>              # we do not check the return code, since zgrep -m1 often errors out
>              # with 'stdout: broken pipe'
>              if out:


-- 
https://code.launchpad.net/~brian-murray/apport/zgrep-fallback/+merge/310218
Your team Apport upstream developers is requested to review the proposed merge of lp:~brian-murray/apport/zgrep-fallback into lp:apport.

References

[Merge] lp:~brian-murray/apport/zgrep-fallback into lp:apport
From: Brian Murray, 2016-11-07