launchpad-reviewers team mailing list archive

Thread
Date

[Merge] ~cjwatson/launchpad:py3-messageset-decode-header into launchpad:master

To: mp+395936@xxxxxxxxxxxxxxxxxx
From: Colin Watson <cjwatson@xxxxxxxxxxxxx>
Date: Thu, 07 Jan 2021 17:29:22 -0000
Reply-to: mp+395936@xxxxxxxxxxxxxxxxxx
Sender: bounces@xxxxxxxxxxxxx

Colin Watson has proposed merging ~cjwatson/launchpad:py3-messageset-decode-header into launchpad:master.

Commit message:
Fix MessageSet._decode_header for Python 3

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers)

For more details, see:
https://code.launchpad.net/~cjwatson/launchpad/+git/launchpad/+merge/395936

On Python 3, decode_header returns (str, None) if the given header has no internal encoding, even though it normally returns (bytes, charset) pairs.  Adjust MessageSet._decode_header to cope with this.
-- 
Your team Launchpad code reviewers is requested to review the proposed merge of ~cjwatson/launchpad:py3-messageset-decode-header into launchpad:master.

diff --git a/lib/lp/services/messages/model/message.py b/lib/lp/services/messages/model/message.py
index 05d8a5d..28d70b8 100644
--- a/lib/lp/services/messages/model/message.py
+++ b/lib/lp/services/messages/model/message.py
@@ -254,21 +254,19 @@ class MessageSet:
         # Re-encode the header parts using utf-8, replacing undecodable
         # characters with question marks.
         re_encoded_bits = []
-        for bytes, charset in bits:
-            if charset is None:
-                charset = 'us-ascii'
+        for word, charset in bits:
             # 2008-09-26 gary:
             # The RFC 2047 encoding names and the Python encoding names are
             # not always the same. A safer and more correct approach would use
-            #   bytes.decode(email.charset.Charset(charset).input_codec,
-            #                'replace')
+            #   word.decode(email.charset.Charset(charset).input_codec,
+            #               'replace')
             # or similar, rather than
-            #   bytes.decode(charset, 'replace')
+            #   word.decode(charset, 'replace')
             # That said, this has not bitten us so far, and is only likely to
             # cause problems in unusual encodings that we are hopefully
             # unlikely to encounter in this part of the code.
-            re_encoded_bits.append(
-                (self.decode(bytes, charset).encode('utf-8'), 'utf-8'))
+            decoded = word if charset is None else self.decode(word, charset)
+            re_encoded_bits.append((decoded.encode('utf-8'), 'utf-8'))
 
         return six.text_type(email.header.make_header(re_encoded_bits))