launchpad-reviewers team mailing list archive
-
launchpad-reviewers team
-
Mailing list archive
-
Message #14890
[Merge] lp:~sinzui/launchpad/shutdown-mailman into lp:launchpad
Curtis Hovey has proposed merging lp:~sinzui/launchpad/shutdown-mailman into lp:launchpad.
Commit message:
Update the mailman xmlrpc runner to exit early when the service is stopped.
Requested reviews:
Launchpad code reviewers (launchpad-reviewers)
Related bugs:
Bug #753306 in Launchpad itself: "mailman doesn't shut down cleanly"
https://bugs.launchpad.net/launchpad/+bug/753306
For more details, see:
https://code.launchpad.net/~sinzui/launchpad/shutdown-mailman/+merge/141967
During the last few deployments (possibly longer) mailman hasn't shut
down cleaning. We run the initscript stop -
https://pastebin.canonical.com/45791/ - and the mailmanctl process is
still running and needs to manually killed. This makes mailman a tough
target for a "nodowntime" deployment as it would block on failing to
shut down.
RULES
Pre-implementation: gnuoy
* We see that the xmlrpc queue runner continues to run after the rest
of mailman has shutdown.
* Other queues process 1 file/message per oneloop() and the longest
we have seen a process run is about 3 minutes.
* The xmlrpc runner does not process files, it make a series of
network calls that take an average of 15 minutes to complete
per oneloop().
* The base runner class provides shortcircuit() which can be checked
by oneloop() to exit early. The get_subscriptions() method takes
up most of the time oneloop() uses; the method is actually doing
batching work in a subloop. The subloop can also check to exit early.
QA
* Stop staging's mailman to verify that shutdown completes quickly,
and that it can be restarted without a paused process. We have
verified that staging behaves exactly like production, taking
between 5 and 15 minutes to shutdown the xmlrpc queue, and the main
mailman process is left behind.
LINT
lib/lp/services/mailman/monkeypatches/xmlrpcrunner.py
lib/lp/services/mailman/tests/test_xmlrpcrunner.py
LoC
I have more than 10,000 lines of credit this week.
TEST
./bin/test -vc -t OneLoop lp.services.mailman
IMPLEMENTATION
I added a call to runner._shortcircuit() to the _get_subscriptions() method.
I did the same to the _oneloop() method, but I changed the method to
loop over the all the methods to avoid repeated guards around each
method.
lib/lp/services/mailman/monkeypatches/xmlrpcrunner.py
lib/lp/services/mailman/tests/test_xmlrpcrunner.py
--
https://code.launchpad.net/~sinzui/launchpad/shutdown-mailman/+merge/141967
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~sinzui/launchpad/shutdown-mailman into lp:launchpad.
=== modified file 'lib/lp/services/mailman/monkeypatches/xmlrpcrunner.py'
--- lib/lp/services/mailman/monkeypatches/xmlrpcrunner.py 2012-12-26 01:04:05 +0000
+++ lib/lp/services/mailman/monkeypatches/xmlrpcrunner.py 2013-01-04 19:22:21 +0000
@@ -138,10 +138,15 @@
This method always returns 0 to indicate to the base class's main loop
that it should sleep for a while after calling this method.
"""
+ methods = (
+ self._check_list_actions, self._get_subscriptions,
+ self._check_held_messages)
try:
- self._check_list_actions()
- self._get_subscriptions()
- self._check_held_messages()
+ for method in methods:
+ if self._shortcircuit():
+ # The runner was shutdown during the long network call.
+ break
+ method()
except:
# Log the exception and report an OOPS.
log_exception('Unexpected XMLRPCRunner exception')
@@ -345,6 +350,9 @@
# different.
shuffle(lists)
while lists:
+ if self._shortcircuit():
+ # Stop was called during a long network call.
+ return
batch = lists[:mm_cfg.XMLRPC_SUBSCRIPTION_BATCH_SIZE]
lists = lists[mm_cfg.XMLRPC_SUBSCRIPTION_BATCH_SIZE:]
## syslog('xmlrpc', 'batch: %s', batch)
=== modified file 'lib/lp/services/mailman/tests/test_xmlrpcrunner.py'
--- lib/lp/services/mailman/tests/test_xmlrpcrunner.py 2013-01-04 05:42:01 +0000
+++ lib/lp/services/mailman/tests/test_xmlrpcrunner.py 2013-01-04 19:22:21 +0000
@@ -40,6 +40,7 @@
from lp.services.xmlrpc import Transport
from lp.testing import (
person_logged_in,
+ monkey_patch,
TestCase,
)
from lp.testing.fixture import CaptureOops
@@ -379,6 +380,20 @@
with locked_list(mm_list_2):
self.assertEqual(1, mm_list_2.isMember(lp_user_email))
+ def test_get_subscriptions_shortcircut(self):
+ # The method exist earlty without completing the update when
+ # the runner is stopping.
+ team, mailing_list = self.makeTeamList('team-1', 'owner-1')
+ lp_user_email = 'albatros@xxxxxx'
+ lp_user = self.factory.makePerson(name='albatros', email=lp_user_email)
+ with person_logged_in(lp_user):
+ # The factory person has auto join mailing list enabled.
+ lp_user.join(team)
+ self.runner.stop()
+ self.runner._get_subscriptions()
+ with locked_list(self.mm_list):
+ self.assertEqual(0, self.mm_list.isMember(lp_user_email))
+
def test_constructing_to_active_recovery(self):
# Lp is informed of the active list if it wrongly believes it is
# being constructed.
@@ -418,3 +433,25 @@
MailingListStatus.UPDATING)
self.runner._oneloop()
self.assertEqual(MailingListStatus.ACTIVE, mailing_list.status)
+
+ def test_shortcircuit(self):
+ # Oneloop will exit early if the runner is stopping.
+ class State:
+
+ def __init__(self):
+ self.checked = None
+
+ shortcircut = State()
+
+ def fake_called():
+ shortcircut.checked = False
+
+ def fake_stop():
+ shortcircut.checked = True
+ self.runner.stop()
+
+ with monkey_patch(self.runner,
+ _check_list_actions=fake_stop, _get_subscriptions=fake_called):
+ self.runner._oneloop()
+ self.assertTrue(self.runner._shortcircuit())
+ self.assertTrue(shortcircut.checked)
Follow ups