← Back to team overview

launchpad-reviewers team mailing list archive

[Merge] lp:~sinzui/launchpad/shutdown-mailman into lp:launchpad

 

Curtis Hovey has proposed merging lp:~sinzui/launchpad/shutdown-mailman into lp:launchpad.

Commit message:
Update the mailman xmlrpc runner to exit early when the service is stopped.

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers)
Related bugs:
  Bug #753306 in Launchpad itself: "mailman doesn't shut down cleanly"
  https://bugs.launchpad.net/launchpad/+bug/753306

For more details, see:
https://code.launchpad.net/~sinzui/launchpad/shutdown-mailman/+merge/141967

During the last few deployments (possibly longer) mailman hasn't shut
down cleaning. We run the initscript stop -
https://pastebin.canonical.com/45791/ - and the mailmanctl process is
still running and needs to manually killed. This makes mailman a tough
target for a "nodowntime" deployment as it would block on failing to
shut down.


RULES

    Pre-implementation: gnuoy
    * We see that the xmlrpc queue runner continues to run after the rest
      of mailman has shutdown.
      * Other queues process 1 file/message per oneloop() and the longest
        we have seen a process run is about 3 minutes.
      * The xmlrpc runner does not process files, it make a series of
        network calls that take an average of 15 minutes to complete
        per oneloop().
    * The base runner class provides shortcircuit() which can be checked
      by oneloop() to exit early. The get_subscriptions() method takes
      up most of the time oneloop() uses; the method is actually doing
      batching work in a subloop. The subloop can also check to exit early.


QA

    * Stop staging's mailman to verify that shutdown completes quickly,
      and that it can be restarted without a paused process. We have
      verified that staging behaves exactly like production, taking
      between 5 and 15 minutes to shutdown the xmlrpc queue, and the main
      mailman process is left behind.


LINT

    lib/lp/services/mailman/monkeypatches/xmlrpcrunner.py
    lib/lp/services/mailman/tests/test_xmlrpcrunner.py


LoC

    I have more than 10,000 lines of credit this week.


TEST

    ./bin/test -vc -t OneLoop lp.services.mailman


IMPLEMENTATION

I added a call to runner._shortcircuit() to the _get_subscriptions() method.
I did the same to the _oneloop() method, but I changed the method to
loop over the all the methods to avoid repeated guards around each
method.
    lib/lp/services/mailman/monkeypatches/xmlrpcrunner.py
    lib/lp/services/mailman/tests/test_xmlrpcrunner.py
-- 
https://code.launchpad.net/~sinzui/launchpad/shutdown-mailman/+merge/141967
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~sinzui/launchpad/shutdown-mailman into lp:launchpad.
=== modified file 'lib/lp/services/mailman/monkeypatches/xmlrpcrunner.py'
--- lib/lp/services/mailman/monkeypatches/xmlrpcrunner.py	2012-12-26 01:04:05 +0000
+++ lib/lp/services/mailman/monkeypatches/xmlrpcrunner.py	2013-01-04 19:22:21 +0000
@@ -138,10 +138,15 @@
         This method always returns 0 to indicate to the base class's main loop
         that it should sleep for a while after calling this method.
         """
+        methods = (
+            self._check_list_actions, self._get_subscriptions,
+            self._check_held_messages)
         try:
-            self._check_list_actions()
-            self._get_subscriptions()
-            self._check_held_messages()
+            for method in methods:
+                if self._shortcircuit():
+                    # The runner was  shutdown during the long network call.
+                    break
+                method()
         except:
             # Log the exception and report an OOPS.
             log_exception('Unexpected XMLRPCRunner exception')
@@ -345,6 +350,9 @@
         # different.
         shuffle(lists)
         while lists:
+            if self._shortcircuit():
+                # Stop was called during a long network call.
+                return
             batch = lists[:mm_cfg.XMLRPC_SUBSCRIPTION_BATCH_SIZE]
             lists = lists[mm_cfg.XMLRPC_SUBSCRIPTION_BATCH_SIZE:]
             ## syslog('xmlrpc', 'batch: %s', batch)

=== modified file 'lib/lp/services/mailman/tests/test_xmlrpcrunner.py'
--- lib/lp/services/mailman/tests/test_xmlrpcrunner.py	2013-01-04 05:42:01 +0000
+++ lib/lp/services/mailman/tests/test_xmlrpcrunner.py	2013-01-04 19:22:21 +0000
@@ -40,6 +40,7 @@
 from lp.services.xmlrpc import Transport
 from lp.testing import (
     person_logged_in,
+    monkey_patch,
     TestCase,
     )
 from lp.testing.fixture import CaptureOops
@@ -379,6 +380,20 @@
         with locked_list(mm_list_2):
             self.assertEqual(1, mm_list_2.isMember(lp_user_email))
 
+    def test_get_subscriptions_shortcircut(self):
+        # The method exist earlty without completing the update when
+        # the runner is stopping.
+        team, mailing_list = self.makeTeamList('team-1', 'owner-1')
+        lp_user_email = 'albatros@xxxxxx'
+        lp_user = self.factory.makePerson(name='albatros', email=lp_user_email)
+        with person_logged_in(lp_user):
+            # The factory person has auto join mailing list enabled.
+            lp_user.join(team)
+        self.runner.stop()
+        self.runner._get_subscriptions()
+        with locked_list(self.mm_list):
+            self.assertEqual(0, self.mm_list.isMember(lp_user_email))
+
     def test_constructing_to_active_recovery(self):
         # Lp is informed of the active list if it wrongly believes it is
         # being constructed.
@@ -418,3 +433,25 @@
             MailingListStatus.UPDATING)
         self.runner._oneloop()
         self.assertEqual(MailingListStatus.ACTIVE, mailing_list.status)
+
+    def test_shortcircuit(self):
+        # Oneloop will exit early if the runner is stopping.
+        class State:
+
+            def __init__(self):
+                self.checked = None
+
+        shortcircut = State()
+
+        def fake_called():
+            shortcircut.checked = False
+
+        def fake_stop():
+            shortcircut.checked = True
+            self.runner.stop()
+
+        with monkey_patch(self.runner,
+                _check_list_actions=fake_stop, _get_subscriptions=fake_called):
+            self.runner._oneloop()
+            self.assertTrue(self.runner._shortcircuit())
+            self.assertTrue(shortcircut.checked)


Follow ups