← Back to team overview

launchpad-reviewers team mailing list archive

[Merge] lp:~sinzui/launchpad/mailman-oops-reporting-0 into lp:launchpad

 

Curtis Hovey has proposed merging lp:~sinzui/launchpad/mailman-oops-reporting-0 into lp:launchpad.

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers)

For more details, see:
https://code.launchpad.net/~sinzui/launchpad/mailman-oops-reporting-0/+merge/46632

Add oops reporting and recovery to all uses the the xmlrpc proxy.

    Launchpad bug: https://bugs.launchpad.net/bugs/218923
    Pre-implementation: diogo, barry
    Test command: ./bin/test -vv \
      -t test_xmlrpcrunner -t test_lphandler -t test_lpstanding

In some cases in the XMLRPCRunner, a failure in one mailing list can cascade
and kill the queue runner. These places should be protected with try/excepts
so that a failure in one list won't bring down the entire queue runner.

The recent outages were 501 and 503 from the proxy. All calls the to proxy
need to be wrapped to report outages.

ADDENDUM:
XMLRPCRunner has try-except block around each call the the proxy,
the lphandler and lp standing does not. The error logs show that lphandler
is where the last outage happened. Also the commenting in the handler imply
the message is re-queued in the incoming queue, but the log showed they
were placed in the shunted queue.

--------------------------------------------------------------------

RULES

    * Add try/except blocks to all calls to the proxy and report oopses.
    * In handlers, instantiate the re-enqueue the message in the incoming
      queue, then raise DiscardMessage to remove the copy.


QA

    * Staging's error report is expected to contain many mailman oopses
      after a data sync because the db cannot respond to the load.


LINT

    lib/lp/services/mailman/monkeypatches/lphandler.py
    lib/lp/services/mailman/monkeypatches/lpstanding.py
    lib/lp/services/mailman/monkeypatches/xmlrpcrunner.py
    lib/lp/services/mailman/testing/__init__.py
    lib/lp/services/mailman/tests/test_lphandler.py
    lib/lp/services/mailman/tests/test_lpstanding.py
    lib/lp/services/mailman/tests/test_xmlrpcrunner.py



IMPLEMENTATION

Added handle_proxy_error() to handle exception logging and re-enqueue of
messages. Moved reset_log() and reset_mark() to the base class so that they
can be reused. Added a test help to assert that the message is re-enqueued.
    lib/lp/services/mailman/monkeypatches/xmlrpcrunner.py
    lib/lp/services/mailman/testing/__init__.py
    lib/lp/services/mailman/tests/test_xmlrpcrunner.py

Wrapped the proxy in try-except calls and used handle_proxy_error() to
recover.
    lib/lp/services/mailman/monkeypatches/lphandler.py
    lib/lp/services/mailman/monkeypatches/lpstanding.py
    lib/lp/services/mailman/tests/test_lphandler.py
    lib/lp/services/mailman/tests/test_lpstanding.py
-- 
https://code.launchpad.net/~sinzui/launchpad/mailman-oops-reporting-0/+merge/46632
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~sinzui/launchpad/mailman-oops-reporting-0 into lp:launchpad.
=== modified file 'lib/lp/services/mailman/monkeypatches/lphandler.py'
--- lib/lp/services/mailman/monkeypatches/lphandler.py	2010-12-14 05:12:51 +0000
+++ lib/lp/services/mailman/monkeypatches/lphandler.py	2011-01-18 17:15:32 +0000
@@ -46,7 +46,10 @@
     # This will fail if we can't talk to Launchpad.  That's okay though
     # because Mailman's IncomingRunner will re-queue the message and re-start
     # processing at this handler.
-    is_member = proxy.isRegisteredInLaunchpad(sender)
+    try:
+        is_member = proxy.isRegisteredInLaunchpad(sender)
+    except Exception, error:
+        XMLRPCRunner.handle_proxy_error(error, msg, msgdata)
     # This handler can just return if the sender is a member of Launchpad.
     if is_member:
         return

=== modified file 'lib/lp/services/mailman/monkeypatches/lpstanding.py'
--- lib/lp/services/mailman/monkeypatches/lpstanding.py	2010-12-13 23:37:37 +0000
+++ lib/lp/services/mailman/monkeypatches/lpstanding.py	2011-01-18 17:15:32 +0000
@@ -28,7 +28,10 @@
     # This will fail if we can't talk to Launchpad.  That's okay though
     # because Mailman's IncomingRunner will re-queue the message and re-start
     # processing at this handler.
-    in_good_standing = proxy.inGoodStanding(sender)
+    try:
+        in_good_standing = proxy.inGoodStanding(sender)
+    except Exception, error:
+        XMLRPCRunner.handle_proxy_error(error, msg, msgdata)
     # If the sender is a member in good standing, that's all we need to know
     # in order to let the message pass.
     if in_good_standing:

=== modified file 'lib/lp/services/mailman/monkeypatches/xmlrpcrunner.py'
--- lib/lp/services/mailman/monkeypatches/xmlrpcrunner.py	2011-01-13 17:16:24 +0000
+++ lib/lp/services/mailman/monkeypatches/xmlrpcrunner.py	2011-01-18 17:15:32 +0000
@@ -5,6 +5,7 @@
 
 __all__ = [
     'get_mailing_list_api_proxy',
+    'handle_proxy_error',
     'XMLRPCRunner',
     ]
 
@@ -33,6 +34,7 @@
 from Mailman.Logging.Syslog import syslog
 from Mailman.MailList import MailList
 from Mailman.Queue.Runner import Runner
+from Mailman.Queue.sbcache import get_switchboard
 
 # XXX sinzui 2008-08-15 bug=258423:
 # We should be importing from lazr.errorlog.
@@ -79,6 +81,24 @@
     syslog('error', traceback_text)
 
 
+def handle_proxy_error(error, msg=None, msgdata=None):
+    """Log the error and enqueue the message if needed.
+
+    :param error: The error to log.
+    :param msg: An optional message to re-enqueue.
+    :param msgdata: The message data to enque with the message.
+    :raise DiscardMessage: When a message is enqueued.
+    """
+    if isinstance(error, (xmlrpclib.ProtocolError, socket.error)):
+        log_exception('Cannot talk to Launchpad:\n%s', error)
+    else:
+        log_exception('Launchpad exception: %s', error)
+    if msg is not None:
+        queue = get_switchboard(mm_cfg.INQUEUE_DIR)
+        queue.enqueue(msg, msgdata)
+        raise Errors.DiscardMessage
+
+
 class XMLRPCRunner(Runner):
     """A Mailman 'queue runner' for talking to the Launchpad XMLRPC service.
     """

=== modified file 'lib/lp/services/mailman/testing/__init__.py'
--- lib/lp/services/mailman/testing/__init__.py	2010-12-15 20:05:29 +0000
+++ lib/lp/services/mailman/testing/__init__.py	2011-01-18 17:15:32 +0000
@@ -18,6 +18,9 @@
     mm_cfg,
     )
 from Mailman.Queue import XMLRPCRunner
+from Mailman.Logging.Syslog import syslog
+from Mailman.Queue.sbcache import get_switchboard
+
 from zope.security.proxy import removeSecurityProxy
 
 from canonical.testing.layers import DatabaseFunctionalLayer
@@ -107,3 +110,48 @@
         mm_message = email.message_from_string(
             message.as_string(), Message.Message)
         return mm_message
+
+    def get_log_entry(self, match_text):
+        """Return the first matched text line found in the log."""
+        log_path = syslog._logfiles['xmlrpc']._Logger__filename
+        mark = None
+        with open(log_path, 'r') as log_file:
+            for line in log_file.readlines():
+                if match_text in line:
+                    mark = line
+                    break
+        return mark
+
+    def get_mark(self):
+        """Return the --MARK-- entry from the log or None."""
+        return self.get_log_entry('--MARK--')
+
+    def reset_log(self):
+        """Truncate the log."""
+        log_path = syslog._logfiles['xmlrpc']._Logger__filename
+        syslog._logfiles['xmlrpc'].close()
+        with open(log_path, 'w') as log_file:
+            log_file.truncate()
+        syslog.write_ex('xmlrpc', 'Reset by test.')
+
+    def assertIsEnqueued(self, msg):
+        """Assert the message was appended to the incoming queue."""
+        switchbaord = get_switchboard(mm_cfg.INQUEUE_DIR)
+        file_path = switchbaord.files()[-1]
+        queued_msg, queued_msg_data = switchbaord.dequeue(file_path)
+        self.assertEqual(msg['message-id'], queued_msg['message-id'])
+
+    @contextmanager
+    def raise_proxy_exception(self, method_name):
+        """Raise an exception when calling the passed proxy method name."""
+
+        def raise_exception(*args):
+            raise Exception('Test exception handling.')
+
+        proxy = XMLRPCRunner.get_mailing_list_api_proxy()
+        original_method = getattr(proxy.__class__, method_name)
+        setattr(proxy.__class__, method_name, raise_exception)
+        try:
+            yield
+        finally:
+            setattr(proxy.__class__, method_name, original_method)

=== modified file 'lib/lp/services/mailman/tests/test_lphandler.py'
--- lib/lp/services/mailman/tests/test_lphandler.py	2010-12-15 00:49:19 +0000
+++ lib/lp/services/mailman/tests/test_lphandler.py	2011-01-18 17:15:32 +0000
@@ -81,3 +81,17 @@
         silence = LaunchpadMember.process(self.mm_list, message, msg_data)
         self.assertEqual(None, silence)
         self.assertEqual(True, msg_data['approved'])
+
+    def test_proxy_error_retries_message(self):
+        # When the Launchpad xmlrpc proxy raises an error, the message
+        # is re-enqueed.
+        lp_user_email = 'groundhog@xxxxxx'
+        lp_user = self.factory.makePerson(email=lp_user_email)
+        message = self.makeMailmanMessage(
+            self.mm_list, lp_user_email, 'subject', 'any content.')
+        msg_data = {}
+        with self.raise_proxy_exception('isRegisteredInLaunchpad'):
+            args = (self.mm_list, message, msg_data)
+            self.assertRaises(
+                Errors.DiscardMessage, LaunchpadMember.process, *args)
+            self.assertIsEnqueued(message)

=== modified file 'lib/lp/services/mailman/tests/test_lpstanding.py'
--- lib/lp/services/mailman/tests/test_lpstanding.py	2010-12-14 05:12:51 +0000
+++ lib/lp/services/mailman/tests/test_lpstanding.py	2011-01-18 17:15:32 +0000
@@ -5,6 +5,7 @@
 __metaclass__ = type
 __all__ = []
 
+from Mailman import Errors
 from Mailman.Handlers import LPStanding
 
 from canonical.testing.layers import DatabaseFunctionalLayer
@@ -57,3 +58,15 @@
         silence = LPStanding.process(self.mm_list, message, msg_data)
         self.assertEqual(None, silence)
         self.assertTrue(msg_data['approved'])
+
+    def test_proxy_error_retries_message(self):
+        # When the Launchpad xmlrpc proxy raises an error, the message
+        # is re-enqueed.
+        message = self.makeMailmanMessage(
+            self.mm_list, self.lp_user_email, 'subject', 'any content.')
+        msg_data = {}
+        with self.raise_proxy_exception('inGoodStanding'):
+            args = (self.mm_list, message, msg_data)
+            self.assertRaises(
+                Errors.DiscardMessage, LPStanding.process, *args)
+            self.assertIsEnqueued(message)

=== modified file 'lib/lp/services/mailman/tests/test_xmlrpcrunner.py'
--- lib/lp/services/mailman/tests/test_xmlrpcrunner.py	2011-01-13 19:18:41 +0000
+++ lib/lp/services/mailman/tests/test_xmlrpcrunner.py	2011-01-18 17:15:32 +0000
@@ -7,9 +7,14 @@
 
 from contextlib import contextmanager
 from datetime import datetime
+import socket
 
+from Mailman import Errors
 from Mailman.Logging.Syslog import syslog
-from Mailman.Queue.XMLRPCRunner import XMLRPCRunner
+from Mailman.Queue.XMLRPCRunner import (
+    handle_proxy_error,
+    XMLRPCRunner,
+    )
 
 from canonical.testing.layers import DatabaseFunctionalLayer
 from lp.services.mailman.testing import (
@@ -25,6 +30,7 @@
     This function replaces _check_list_actions() with a function that
     raises an error. _oneloop() handles the exception.
     """
+
     def raise_exception():
         raise Exception('Test exception handling.')
 
@@ -51,25 +57,6 @@
         # the runner had a reference to the true proxy in its __init__.
         self.runner._proxy = get_mailing_list_api_test_proxy()
 
-    def get_mark(self):
-        """Return the first mark line found in the log."""
-        log_path = syslog._logfiles['xmlrpc']._Logger__filename
-        mark = None
-        with open(log_path, 'r') as log_file:
-            for line in log_file.readlines():
-                if '--MARK--' in line:
-                    mark = line
-                    break
-        return mark
-
-    def reset_log(self):
-        """Truncate the log."""
-        log_path = syslog._logfiles['xmlrpc']._Logger__filename
-        syslog._logfiles['xmlrpc'].close()
-        with open(log_path, 'w') as log_file:
-            log_file.truncate()
-        syslog.write_ex('xmlrpc', 'Reset by test.')
-
     def test_heartbeat_on_start(self):
         # A heartbeat is recorded in the log on start.
         mark = self.get_mark()
@@ -107,3 +94,42 @@
             self.runner._oneloop()
         mark = self.get_mark()
         self.assertTrue(mark is None)
+
+
+class TestHandleProxyError(MailmanTestCase):
+    """Test XMLRPCRunner.handle_proxy_error function."""
+
+    layer = DatabaseFunctionalLayer
+
+    def setUp(self):
+        super(TestHandleProxyError, self).setUp()
+        self.team, self.mailing_list = self.factory.makeTeamAndMailingList(
+            'team-1', 'team-1-owner')
+        self.mm_list = self.makeMailmanList(self.mailing_list)
+        syslog.write_ex('xmlrpc', 'Ensure the log is open.')
+        self.reset_log()
+
+    def test_communication_log_entry(self):
+        # Connection errors are reported in the log.
+        error = socket.error('Testing socket error.')
+        handle_proxy_error(error)
+        mark = self.get_log_entry('Cannot talk to Launchpad:')
+        self.assertTrue(mark is not None)
+
+    def test_fault_log_entry(self):
+        # Fault errors are reported in the log.
+        error = Exception('Testing generic error.')
+        handle_proxy_error(error)
+        mark = self.get_log_entry('Launchpad exception:')
+        self.assertTrue(mark is not None)
+
+    def test_message_raises_discard_message_error(self):
+        # When message is passed to the function, DiscardMessage is raised
+        # and the message is re-enqueued in the incoming queue.
+        error = Exception('Testing generic error.')
+        msg = self.makeMailmanMessage(
+            self.mm_list, 'lost@xxxxxxxxxxx', 'subject', 'any content.')
+        msg_data = {}
+        self.assertRaises(
+            Errors.DiscardMessage, handle_proxy_error, error, msg, msg_data)
+        self.assertIsEnqueued(msg)