launchpad-reviewers team mailing list archive

Thread
Date
Re: [Merge] lp:~blr/launchpad/bug-1334577-verbose-diff into lp:launchpad

To: mp+243751@xxxxxxxxxxxxxxxxxx
From: William Grant <me@xxxxxxxxxxxxxxxxxx>
Date: Fri, 05 Dec 2014 02:46:05 -0000
In-reply-to: <20141205021122.14940.13536.launchpad@ackee.canonical.com>
Reply-to: mp+243751@xxxxxxxxxxxxxxxxxx
Sender: bounces@xxxxxxxxxxxxx
Review: Needs Fixing code



Diff comments:

> === modified file 'lib/lp/code/mail/codereviewcomment.py'
> --- lib/lp/code/mail/codereviewcomment.py	2014-05-21 08:52:53 +0000
> +++ lib/lp/code/mail/codereviewcomment.py	2014-12-05 02:10:28 +0000
> @@ -10,6 +10,7 @@
>      'CodeReviewCommentMailer',
>      ]
>  
> +from bzrlib import patches
>  
>  from zope.component import getUtility

bzrlib and zope are both external but not stdlib packages, so this blank shouldn't be here. utilities/format-imports can fix it for you.

>  from zope.security.proxy import removeSecurityProxy
> @@ -169,18 +170,90 @@
>                  content, content_type=content_type, filename=filename)
>  
>  
> +def comment_in_hunk(hunk, comments, line_count):
> +    """ Check if comment exists in hunk lines. """

We don't normally put spaces around the text of docstrings.

> +
> +    # check comment in context line
> +    comment = comments.get(str(line_count))
> +    if comment is not None:
> +        return True
> +
> +    # check comment in hunk lines
> +    for line in hunk.lines:
> +        line_count = line_count + 1
> +        comment = comments.get(str(line_count))
> +        if comment is not None:
> +            return True
> +    return False
> +
> +
> +def format_comment(comment):
> +    """ Returns a list of correctly formatted comment(s). """
> +    comment_lines = []
> +    if comment is not None:
> +        comment_lines.append('')
> +        for comment in comment.splitlines():
> +            comment_lines.append(u'%s' % comment.decode('utf-8', 'replace'))

As we discovered, some of the tests were buggy and used strs instead of unicodes. I think you fixed that, so comment should always be unicode already here, so you can just comment_lines.extend(comment.splitlines()).

> +        comment_lines.append('')
> +    return comment_lines
> +
> +
> +def format_patch_header(patch):
> +    """ Returns a list of correctly formmated patch headers. """

"formatted"

> +    patch_header_lines = []
> +    for p in patch.get_header().splitlines():
> +        patch_header_lines.append('> {0}'.format(p))
> +    return patch_header_lines
> +
> +
>  def build_inline_comments_section(comments, diff_text):
> -    """Return a formatted text section with contextualized comments."""
> +    """ Return a formatted text section with contextualized comments.
> +
> +    Hunks without comments are skipped to limit verbosity.
> +    Comments can be rendered after patch headers, hunk context lines,
> +    and hunk lines.
> +    """
> +    diff_lines = diff_text.splitlines(True)
> +    # allow_dirty() will preserve text not conforming to unified diff
> +    diff_patches = patches.parse_patches(diff_lines, allow_dirty=True)
>      result_lines = []
> -    diff_lines = diff_text.splitlines()
> -    for num, line in enumerate(diff_lines, 1):
> -        result_lines.append(u'> {0}'.format(line.decode('utf-8', 'replace')))
> -        comment = comments.get(str(num))
> +    line_count = 0
> +
> +    for patch in diff_patches:
> +        header_set = False
> +        line_count = line_count + 2  # inc patch headers
> +        comment = comments.get(str(line_count))

What happens if there's a comment on the first line of the header? Weird, but possible, and this will miss it.

Also, if you look at this very diff you'll see that there are === lines before the --- and +++ lines. These also occur if you eg. chmod +x a file. I don't think this handles those cases. It's possible that bzrlib.patches needs adjustment to tell you about those.

> +
>          if comment is not None:
> -            result_lines.append('')
> -            result_lines.extend(comment.splitlines())
> -            result_lines.append('')
> -
> -    result_text = u'\n'.join(result_lines)
> -
> +            # comment in patch header, add header and comment
> +            result_lines.extend(format_patch_header(patch))
> +            result_lines.extend(format_comment(comment))
> +            header_set = True

This duplication is a bit weird. Would it be cleaner to compile a list of lines for just this file, starting with the header and then adding its comment any hunks with comments and their comments, and then only adding that to result_lines if we found any comments while creating it?

The same approach might work for individual hunks, too, preventing the need for two passes.

> +
> +        for hunk in patch.hunks:
> +            line_count = line_count + 1  # inc hunk context line
> +
> +            if comment_in_hunk(hunk, comments, line_count):
> +                if not header_set:
> +                    result_lines.extend(format_patch_header(patch))
> +                    header_set = True
> +
> +                # add context line (hunk header)
> +                result_lines.append(u'> %s' % hunk.get_header().rstrip('\n'))

What if there's a non-ASCII filename?

> +
> +                # comment for context line (hunk header)
> +                comment = comments.get(str(line_count))
> +                if comment is not None:
> +                    result_lines.extend(format_comment(comment))
> +
> +                for line in hunk.lines:
> +                    line_count = line_count + 1  # inc hunk lines
> +                    result_lines.append(u'> %s' % str(
> +                        line).rstrip('\n').decode('utf-8', 'replace'))

str(some_str) is always pointless (it's the identity function), and str(some_unicode) is always buggy (it will crash on non-ASCII characters). You don't need either here.

> +                    comment = comments.get(str(line_count))
> +                    result_lines.extend(format_comment(comment))
> +            else:
> +                line_count = line_count + len(hunk.lines)  # inc hunk lines

=+?

> +
> +    result_text = '\n'.join(result_lines).encode('utf-8')
>      return '\n\nDiff comments:\n\n%s\n\n' % result_text

This is included in the email body, which we treat as Unicode so it can be encoded appropriately by our mail infrastructure later. So just return the unicode here, not a str.

> 
> === modified file 'lib/lp/code/mail/tests/test_codereviewcomment.py'
> --- lib/lp/code/mail/tests/test_codereviewcomment.py	2014-08-19 04:56:39 +0000
> +++ lib/lp/code/mail/tests/test_codereviewcomment.py	2014-12-05 02:10:28 +0000
> @@ -239,7 +239,7 @@
>          See `build_inline_comments_section` tests for formatting details.
>          """
>          comment = self.makeCommentWithInlineComments(
> -            inline_comments={'2': u'Is this from Planet Earth\xa9 ?'})
> +            inline_comments={'2': 'Is this from Pl\u0060net Earth ?'})
>          mailer = CodeReviewCommentMailer.forCreation(comment)
>          commenter = comment.branch_merge_proposal.registrant
>          ctrl = mailer.generateEmail(
> @@ -249,16 +249,15 @@
>              '',
>              'Diff comments:',
>              '',
> -            "> === zbqvsvrq svyr 'yvo/yc/pbqr/vagresnprf/qvss.cl'",
> -            '> --- yvo/yc/pbqr/vagresnprf/qvss.cl      '
> -            '2009-10-01 13:25:12 +0000',
> -            '',
> -            u'Is this from Planet Earth\xa9 ?',
> -            '',
> -            '> +++ yvo/yc/pbqr/vagresnprf/qvss.cl      '
> -            '2010-02-02 15:48:56 +0000'
> +            ('> --- yvo/yc/pbqr/vagresnprf/qvss.cl      '
> +             '2009-10-01 13:25:12 +0000'),
> +            ('> +++ yvo/yc/pbqr/vagresnprf/qvss.cl      '
> +             '2010-02-02 15:48:56 +0000'),
> +            '',
> +            'Is this from Pl\u0060net Earth ?',
> +            '',
>          ]
> -        self.assertEqual(expected_lines, ctrl.body.splitlines()[1:10])
> +        self.assertEqual(expected_lines, ctrl.body.splitlines()[1:9])
>  
>      def makeComment(self, email_message):
>          message = getUtility(IMessageSet).fromEmail(email_message.as_string())
> @@ -389,6 +388,9 @@
>          "@@ -0,0 +1,2 @@\n"
>          "+a\n"
>          "+b\n"
> +        "@@ -1,2 +0,0 @@\n"
> +        "-x\n"
> +        "-y\n"
>          "--- foo\t2009-08-26 15:53:23.000000000 -0400\n"
>          "+++ foo\t2009-08-26 15:56:43.000000000 -0400\n"
>          "@@ -1,3 +1,4 @@\n"
> @@ -408,44 +410,64 @@
>          # The inline comments section starts with a 4-lines header
>          # (empty lines and title) and ends with an empty line.
>          section = self.getSection({}).splitlines()
> -        header = section[:5]
> +        header = section[:4]
>          self.assertEqual(
>              ['',
>               '',
>               'Diff comments:',
> -             '',
> -             '> --- bar\t2009-08-26 15:53:34.000000000 -0400'],
> -            header)
> -        footer = section[-2:]
> +             ''], header)
> +        footer = section[-1:]
>          self.assertEqual(
> -            ['> +e',
> -             ''],
> +            [''],
>              footer)
>  
>      def test_single_line_comment(self):
>          # The inline comments are correctly contextualized in the diff.
>          # and prefixed with '>>> '
> -        comments = {'2': u'\u03b4\u03bf\u03ba\u03b9\u03bc\u03ae'}
> -        self.assertEqual(
> -            ['> +++ bar\t1969-12-31 19:00:00.000000000 -0500',
> -             '',
> -             u'\u03b4\u03bf\u03ba\u03b9\u03bc\u03ae',
> -             '',
> -             '> @@ -1,3 +0,0 @@',
> -             u'> -\xe5'],
> -            self.getSection(comments).splitlines()[5:11])
> +        comments = {'2': '\u03b4\u03bf\u03ba\u03b9\u03bc\u03ae'}
> +        self.assertEqual(
> +            map(unicode, ['> +++ bar\t1969-12-31 19:00:00.000000000 -0500',
> +             '',
> +             '\u03b4\u03bf\u03ba\u03b9\u03bc\u03ae',
> +             '']),
> +            self.getSection(comments).splitlines()[5:9])
> +
> +    def test_commentless_hunks_ignored(self):
> +        comments = {'14': 'A comment', '19': 'Another comment'}
> +        self.assertEqual(
> +            map(unicode, ['> --- baz\t1969-12-31 19:00:00.000000000 -0500',
> +             '> +++ baz\t2009-08-26 15:53:57.000000000 -0400',
> +             '> @@ -1,2 +0,0 @@',
> +             '> -x',
> +             '> -y',
> +             '',
> +             'A comment',
> +             '',
> +             '> --- foo\t2009-08-26 15:53:23.000000000 -0400',
> +             '> +++ foo\t2009-08-26 15:56:43.000000000 -0400',
> +             '> @@ -1,3 +1,4 @@',
> +             '>  a',
> +             '> -b',
> +             '',
> +             'Another comment',
> +             '',
> +             '>  c',
> +             '> +d',
> +             '> +e']),
> +            self.getSection(comments).splitlines()[4:23])
>  
>      def test_multi_line_comment(self):
>          # Inline comments with multiple lines are rendered appropriately.
>          comments = {'2': 'Foo\nBar'}
>          self.assertEqual(
> -            ['> +++ bar\t1969-12-31 19:00:00.000000000 -0500',
> +            map(unicode, ['> --- bar\t2009-08-26 15:53:34.000000000 -0400',
> +             '> +++ bar\t1969-12-31 19:00:00.000000000 -0500',
>               '',
>               'Foo',
>               'Bar',
>               '',
> -             '> @@ -1,3 +0,0 @@'],
> -            self.getSection(comments).splitlines()[5:11])
> +             '']),
> +            self.getSection(comments).splitlines()[4:])
>  
>      def test_multiple_comments(self):
>          # Multiple inline comments are redered appropriately.
> 


-- 
https://code.launchpad.net/~blr/launchpad/bug-1334577-verbose-diff/+merge/243751
Your team Launchpad code reviewers is subscribed to branch lp:launchpad.
References

[Merge] lp:~blr/launchpad/bug-1334577-verbose-diff into lp:launchpad
From: Bayard 'kit' Randel, 2014-12-05