openerp-india team mailing list archive

Thread
Date

[Bug 921442] Re: [6.0/trunk] Encoding trouble in mail_message parsing and base_action_rule processing

To: openerp-india@xxxxxxxxxxxxxxxxxxx
From: Joël Grand-Guillaume @ CampToCamp <joel.grandguillaume@xxxxxxxxxxxxxx>
Date: Fri, 03 Feb 2012 08:49:18 -0000
Reply-to: Bug 921442 <921442@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

Hi there,


First thank you to review this one. I came back to you as I just face it again on the last trunk version. PLEASE, FIX ALSO THE SECOND PART IN BOTH 6.1 AND 6.0 :

Second part: Fetching mail when some character are broken in the
body_html part. It could happend that you receive an email with broking
characters in it, you should not block everything because of that.
Currently, if it happend, no other mail are fetched, and you got an
PostgreSQL error when trying to write it in DB : invalid byte sequence
for encoding "UTF8": 0xe96ce9


=> We really need this fix here. I just tested my patch and it also work on v6.1 (first part as well as second one).


To reproduce, you'll need to send a mail to the fecthmail of openerp with a broken char in it like in this script:

# Import smtplib for the actual sending function
import smtplib

# Import the email modules we'll need
from email.mime.text import MIMEText

# Create a text/plain message
msg = MIMEText('abcdef' + chr(255))

# me == the sender's email address
# you == the recipient's email address
msg['Subject'] = 'The broken mail'
msg['From'] = 'YOUR_EMAIL'
msg['To'] = 'TO_FETCHMAIL_OPENERP'

# Send the message via our own SMTP server, but don't include the
# envelope header.
s = smtplib.SMTP('YOUR_SMTP_SERVER')
s.sendmail('YOUR_EMAIL', ['TO_FETCHMAIL_OPENERP'], msg.as_string())
s.quit()


Don't forget to replace those variable : YOUR_EMAIL, TO_FETCHMAIL_OPENERP, YOUR_SMTP_SERVER.

Or just type this in a Python console to have the proof:

a = 'abcdef' + chr(255)
a
>>>> 'abcdef\xff'
str(a)
>>>'abcdef\xff'
unicode(a)
>>>>Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 6: ordinal not in range(128)

You see that unicode can't make it. This is exactly what happen when
PostgreSQL try to record the mail, and this stop everything !


Thanks for your review,


Best regards,


Joël

-- 
You received this bug notification because you are a member of OpenERP
Indian Team, which is subscribed to OpenERP Addons.
https://bugs.launchpad.net/bugs/921442

Title:
  [6.0/trunk] Encoding trouble in mail_message parsing and
  base_action_rule processing

Status in OpenERP Addons (modules):
  New
Status in OpenERP Addons 6.0 series:
  Fix Committed
Status in OpenERP Addons trunk series:
  New

Bug description:
  hi,

  
  We've got an issue with encoding in the crm part. This will be difficult to reproduce.. :( So, please look at the code to understand the problem..

  First bug part : Parsing the rules (base_action_rule) breaks if the
  regexp or the name of the resource (model) has some non-string char.
  In the function do_chek you have :

  if reg_name:
              ptrn = re.compile(str(reg_name))
              _result = ptrn.search(str(obj.name))

  Calling str() method here breaks if some unknown char are present in
  the object name or in the regexp name it-self.

  See my patch, I suggest to just remove that call to str() as both are
  already in unicode which is perfect => no need to convert with str().

  
  Second part: Fetching mail when some character are broken in the body_html part. It could happend that you receive an email with broking characters in it, you should not block everything because of that. Currently, if it happend, no other mail are fetched, and you got an PostgreSQL error when trying to write it in DB : invalid byte sequence for encoding "UTF8": 0xe96ce9

  In the parse_message of mail_message.py, you make everything to take
  care of the coding, nothing to improve there I think. But, if the
  message (body_html) contain broken char (I mean non-valid one, not a
  coding trouble, like in this example : unicode('abcdef' + chr(255))),
  then it breaks the mail fetching.

  As body_text is encoded in unicode, I suggest the same in my patch,
  but with the option errors=ignore. This way we skip all non-conform
  char, and ensure the write method will only write valid char in DB.

  The provided patch worked on more than 900 mails, so I think it's
  good.

  Thanks for your consideration,

  Regards,

  
  Joël

  
  Ref: http://docs.python.org/howto/unicode.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/openobject-addons/+bug/921442/+subscriptions

References

[Bug 921442] [NEW] Encoding trouble in mail_message parsing and base_action_rule processing
From: Joël Grand-Guillaume, 2012-01-25