← Back to team overview

launchpad-dev team mailing list archive

[Merge] lp:~mbp/launchpad/mail-cleanup into lp:launchpad

 

You have been requested to review the proposed merge of lp:~mbp/launchpad/mail-cleanup into lp:launchpad.

For more details, see:
https://code.launchpad.net/~mbp/launchpad/mail-cleanup/+merge/58867

To add mail feature scopes, and to improve dkim support, I want to change the mail importer some more. 

However, the existing handleMail function is fairly enormous and deeply indented, so not so easy to safely change.  This branch just tries to tear it apart into smaller parts, including 

 * one function to save the incoming message into the librarian
 * one function to deal with an incoming message
 * one function to send an oops back to the user in appropriate cases

I also added some debug log statements that are useful when watching 'process-mail -vvv'.  I was made aware in a previous merge that warning or higher importance statements show up to sysadmins, so I have not added any of them.   

I also updated and expanded on some documentation about how to set up mail.

The file is now lint-clean.

I tested this with './bin/test -m mail'; please let me know if I should run any others.

Please review this carefully; the existing code has a lot of state in one function and a lot of branching on error conditions, and the diff is not easy to read because of the rearrangements.  On the other hand I feel the resulting code is a lot more straightforward.

Thanks.


-- 
https://code.launchpad.net/~mbp/launchpad/mail-cleanup/+merge/58867
Your team Launchpad Community Development Team is requested to review the proposed merge of lp:~mbp/launchpad/mail-cleanup into lp:launchpad.
=== modified file 'lib/canonical/launchpad/doc/mailbox.txt'
--- lib/canonical/launchpad/doc/mailbox.txt	2010-10-09 16:36:22 +0000
+++ lib/canonical/launchpad/doc/mailbox.txt	2011-04-28 06:53:25 +0000
@@ -21,9 +21,17 @@
 directory above the working tree and call it +testing-mailbox.zcml (actually
 you could call it anything that matches +*.zcml loaded from script.zcml).
 
-There is a helper function create_mail_for_directoryMailBox that has been
-added to the globs in `make harness`, and can be found in the file
-lib/canonical/launchpad/testing/mail.py.
+There is a helper function create_mail_for_directoryMailBox that can be
+run with something like this::
+ 
+    make harness
+    from lp.testing import mail
+    mail.create_mail_for_directoryMailBox(
+	'me@xxxxxxxxxxx',
+	'new@xxxxxxxxxxxxxxxxxx',
+	'help',
+	'message body',
+	)
 
 Let's look at the one we use in tests:
 

=== modified file 'lib/lp/services/mail/incoming.py'
--- lib/lp/services/mail/incoming.py	2011-03-04 03:56:09 +0000
+++ lib/lp/services/mail/incoming.py	2011-04-28 06:53:25 +0000
@@ -171,8 +171,8 @@
     message sender.
 
     :param signature_timestamp_checker: This callable is
-        passed the message signature timestamp, and it can raise an exception if
-        it dislikes it (for example as a replay attack.)  This parameter is
+        passed the message signature timestamp, and it can raise an exception
+        if it dislikes it (for example as a replay attack.)  This parameter is
         intended for use in tests.  If None, ensure_sane_signature_timestamp
         is used.
     """
@@ -182,6 +182,7 @@
     name, email_addr = parseaddr(mail['From'])
     authutil = getUtility(IPlacelessAuthUtility)
     principal = authutil.getPrincipalByLogin(email_addr)
+    log = logging.getLogger('process-mail')
 
     # Check that sender is registered in Launchpad and the email is signed.
     if principal is None:
@@ -203,16 +204,17 @@
 
     if dkim_result:
         if mail.signature is not None:
-            log = logging.getLogger('process-mail')
             log.info('message has gpg signature, therefore not treating DKIM '
                 'success as conclusive')
         else:
             setupInteraction(principal, email_addr)
+            log.debug('message strongly authenticated by dkim') 
             return principal
 
     if signature is None:
         # Mark the principal so that application code can check that the
         # user was weakly authenticated.
+        log.debug('message has no signature; therefore weakly authenticated')
         directlyProvides(
             principal, directlyProvidedBy(principal),
             IWeaklyAuthenticatedPrincipal)
@@ -237,11 +239,13 @@
 
     for gpgkey in person.gpg_keys:
         if gpgkey.fingerprint == sig.fingerprint:
+            log.debug('gpg-signed message')
             break
     else:
         # The key doesn't belong to the user. Mark the principal so that the
         # application code knows that the key used to sign the email isn't
         # associated with the authenticated user.
+        log.debug('gpg-signed message but by no known key of principal')
         directlyProvides(
             principal, directlyProvidedBy(principal),
             IWeaklyAuthenticatedPrincipal)
@@ -311,38 +315,6 @@
 
 def handleMail(trans=transaction,
                signature_timestamp_checker=None):
-    # First we define an error handler. We define it as a local
-    # function, to avoid having to pass a lot of parameters.
-    # pylint: disable-msg=W0631
-    def _handle_error(error_msg, file_alias_url, notify=True):
-        """Handles error occuring in handleMail's for-loop.
-
-        It does the following:
-
-            * deletes the current mail from the mailbox
-            * records an OOPS with error_msg and file_alias_url
-              if notify is True
-            * commits the current transaction to ensure that the
-              message gets sent
-        """
-        mailbox.delete(mail_id)
-        if notify:
-            msg = signed_message_from_string(raw_mail)
-            oops_id = report_oops(
-                file_alias_url=file_alias_url,
-                error_msg=error_msg)
-            send_process_error_notification(
-                msg['From'],
-                'Submit Request Failure',
-                get_error_message('oops.txt', oops_id=oops_id),
-                msg)
-        trans.commit()
-
-    def _handle_user_error(error, mail):
-        mailbox.delete(mail_id)
-        send_process_error_notification(
-            mail['From'], 'Submit Request Failure', str(error), mail)
-        trans.commit()
 
     log = logging.getLogger('process-mail')
     mailbox = getUtility(IMailBox)
@@ -352,122 +324,20 @@
         for mail_id, raw_mail in mailbox.items():
             log.info("Processing mail %s" % mail_id)
             try:
-                file_alias_url = None
-                trans.begin()
-
-                # File the raw_mail in the Librarian
-                file_name = str(uuid1()) + '.txt'
-                try:
-                    file_alias = getUtility(ILibraryFileAliasSet).create(
-                            file_name, len(raw_mail),
-                            cStringIO(raw_mail), 'message/rfc822')
-                except UploadFailed:
-                    # Something went wrong in the Librarian. It could be
-                    # that it's not running, but not necessarily. Log
-                    # the error and skip the message, but don't delete
-                    # it.
-                    log.exception('Upload to Librarian failed')
-                    continue
-
+                file_alias = save_mail_to_librarian(trans, log, raw_mail)
                 # Let's save the url of the file alias, otherwise we might not
                 # be able to access it later if we get a DB exception.
                 file_alias_url = file_alias.http_url
-
-                # If something goes wrong when handling the mail, the
-                # transaction will be aborted. Therefore we need to commit the
-                # transaction now, to ensure that the mail gets stored in the
-                # Librarian.
-                trans.commit()
-                trans.begin()
-
-                try:
-                    mail = signed_message_from_string(raw_mail)
-                except email.Errors.MessageError, error:
-                    mailbox.delete(mail_id)
-                    log = logging.getLogger('lp.services.mail')
-                    log.warn(
-                        "Couldn't convert email to email.Message: %s" % (
-                            file_alias_url, ),
-                        exc_info=True)
-                    continue
-
-
-                # If the Return-Path header is '<>', it probably means
-                # that it's a bounce from a message we sent.
-                if mail['Return-Path'] == '<>':
-                    _handle_error(
-                        "Message had an empty Return-Path.",
-                        file_alias_url, notify=False)
-                    continue
-                if mail.get_content_type() == 'multipart/report':
-                    # Mails with a content type of multipart/report are
-                    # generally DSN messages and should be ignored.
-                    _handle_error(
-                        "Got a multipart/report message.",
-                        file_alias_url, notify=False)
-                    continue
-                if 'precedence' in mail:
-                    _handle_error(
-                        "Got a message with a precedence header.",
-                        file_alias_url, notify=False)
-                    continue
-
-                try:
-                    principal = authenticateEmail(
-                        mail, signature_timestamp_checker)
-                except InvalidSignature, error:
-                    _handle_user_error(error, mail)
-                    continue
-                except InactiveAccount:
-                    _handle_error(
-                        "Inactive account found for %s" % mail['From'],
-                        file_alias_url, notify=False)
-                    continue
-
-                addresses = extract_addresses(
-                    mail, raw_mail, file_alias_url, log)
-
-                try:
-                    do_paranoid_envelope_to_validation(addresses)
-                except AssertionError, e:
-                    _handle_error(
-                        "Invalid email address: %s" % e,
-                        file_alias_url, notify=False)
-                    continue
-
-                handler = None
-                for email_addr in addresses:
-                    user, domain = email_addr.split('@')
-                    handler = mail_handlers.get(domain)
-                    if handler is not None:
-                        break
-
-                if handler is None:
-                    _handle_error(
-                        "No handler registered for '%s' " % (
-                            ', '.join(addresses)),
-                        file_alias_url)
-                    continue
-
-                if principal is None and not handler.allow_unknown_users:
-                    _handle_error(
-                        'Unknown user: %s ' % mail['From'],
-                        file_alias_url, notify=False)
-                    continue
-
-                handled = handler.process(mail, email_addr, file_alias)
-
-                if not handled:
-                    _handle_error(
-                        "Handler found, but message was not handled: %s" % (
-                            mail['From'], ),
-                        file_alias_url)
-                    continue
-
-                # Commit the transaction before deleting the mail in
-                # case there are any errors. If an error occur while
-                # commiting the transaction, the mail will be deleted in
-                # the exception handler.
+                log.debug('Uploaded mail to librarian %s' % (file_alias_url,))
+            except UploadFailed:
+                # Something went wrong in the Librarian. It could be that it's
+                # not running, but not necessarily. Log the error and skip the
+                # message, but don't delete it: retrying might help.
+                log.exception('Upload to Librarian failed')
+                continue
+            try:
+                handle_one_mail(trans, log, raw_mail,
+                    file_alias, file_alias_url, signature_timestamp_checker)
                 trans.commit()
                 mailbox.delete(mail_id)
             except (KeyboardInterrupt, SystemExit):
@@ -475,21 +345,128 @@
             except:
                 # This bare except is needed in order to prevent any bug
                 # in the email handling from causing the email interface
-                # to lock up. If an email causes an unexpected
-                # exception, we simply log the error and delete the
-                # email, so that it doesn't stop the rest of the emails
-                # from being processed.
-                _handle_error(
+                # to lock up. We simply log the error, then send an oops, and
+                # continue through the mailbox, so that it doesn't stop the
+                # rest of the emails from being processed.
+                log.exception(
+                    "An exception was raised inside the handler:\n%s"
+                    % (file_alias_url,))
+                _send_email_oops(trans, log,
+                    signed_message_from_string(raw_mail), raw_mail,
                     "Unhandled exception", file_alias_url)
-                if file_alias_url is not None:
-                    email_info = file_alias_url
-                else:
-                    email_info = raw_mail
-
-                log.error(
-                    "An exception was raised inside the handler:\n%s" % (
-                        email_info),
-                    exc_info=True)
+                mailbox.delete(mail_id)
     finally:
         log.info("Closing the mail box.")
         mailbox.close()
+
+
+def _send_email_oops(trans, log, mail, raw_mail, error_msg, file_alias_url):
+    """Handle an error that generates an oops.
+
+    It does the following:
+        * records an OOPS with error_msg and file_alias_url
+        * commits the current transaction to ensure that the
+            message gets sent
+    """
+    log.info('error processing mail: %s' % (error_msg,))
+    oops_id = report_oops(
+        file_alias_url=file_alias_url,
+        error_msg=error_msg)
+    log.info('oops %s' % (oops_id,))
+    send_process_error_notification(
+        mail['From'],
+        'Submit Request Failure',
+        get_error_message('oops.txt', oops_id=oops_id),
+        mail)
+    trans.commit()
+
+
+def save_mail_to_librarian(trans, log, raw_mail):
+    """Save the message to the librarian, for reference from errors."""
+    trans.begin()
+    # File the raw_mail in the Librarian
+    file_name = str(uuid1()) + '.txt'
+    file_alias = getUtility(ILibraryFileAliasSet).create(
+            file_name, len(raw_mail),
+            cStringIO(raw_mail), 'message/rfc822')
+    # If something goes wrong when handling the mail, the
+    # transaction will be aborted. Therefore we need to commit the
+    # transaction now, to ensure that the mail gets stored in the
+    # Librarian.
+    trans.commit()
+    return file_alias
+
+
+def handle_one_mail(trans, log, raw_mail, file_alias, file_alias_url,
+    signature_timestamp_checker):
+    """Process one message.
+
+    Returns None when the message has either been successfully processed, or
+    handled as a known error condition, in which case a reply will have been
+    sent if appropriate.
+    """
+
+    trans.begin()
+
+    try:
+        mail = signed_message_from_string(raw_mail)
+    except email.Errors.MessageError, error:
+        log.warn("Couldn't convert email to email.Message: %s" % (
+                file_alias_url, ),
+            exc_info=True)
+        return
+    log.debug('processing mail from %r message-id %r' % 
+        (mail['from'], mail['message-id']))
+
+    # If the Return-Path header is '<>', it probably means
+    # that it's a bounce from a message we sent.
+    if mail['Return-Path'] == '<>':
+        log.info("Message had an empty Return-Path.")
+        return
+    if mail.get_content_type() == 'multipart/report':
+        # Mails with a content type of multipart/report are
+        # generally DSN messages and should be ignored.
+        log.info("Got a multipart/report message.")
+        return
+    if 'precedence' in mail:
+        log.info("Got a message with a precedence header.")
+        return
+
+    try:
+        principal = authenticateEmail(
+            mail, signature_timestamp_checker)
+    except InvalidSignature, error:
+        send_process_error_notification(
+            mail['From'], 'Submit Request Failure', str(error), mail)
+        return
+    except InactiveAccount:
+        log.info("Inactive account found for %s" % mail['From'])
+        return
+
+    addresses = extract_addresses(mail, raw_mail, file_alias_url, log)
+    log.debug('mail was originally to: %r' % (addresses,))
+
+    try:
+        do_paranoid_envelope_to_validation(addresses)
+    except AssertionError, e:
+        log.info("Invalid email address: %s" % e)
+        return
+
+    handler = None
+    for email_addr in addresses:
+        user, domain = email_addr.split('@')
+        handler = mail_handlers.get(domain)
+        if handler is not None:
+            break
+    else:
+        raise AssertionError(
+            "No handler registered for '%s' " % (', '.join(addresses)))
+
+    if principal is None and not handler.allow_unknown_users:
+        log.info('Unknown user: %s ' % mail['From'])
+        return
+
+    handled = handler.process(mail, email_addr, file_alias)
+    if not handled:
+        raise AssertionError(
+            "Handler found, but message was not handled")