← Back to team overview

launchpad-reviewers team mailing list archive

[Merge] lp:~flacoste/launchpad/bug-688503 into lp:launchpad

 

Francis J. Lacoste has proposed merging lp:~flacoste/launchpad/bug-688503 into lp:launchpad.

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers)


Hi,

(Looks like I'm missing a plugin...)

This implements the LOSA request described in bug 688503. It adds a view at
/+haproxy that return status 200 or 500 based on a global flag. That flag is
controlled through the SIGHUP signal (suggested by elmo).

This URL will be used as the probe URL in HAProxy. Normally, it returns 200
and this tells HAProxy that the app server is functioning normally. When that
requests fails or returns 500, HAProxy will stop sending requests to it.

This will allow to restart app servers without interrupting user requests, as
LOSA will send the HUP signal which will stop HAProxy from dispatching
requests to it. The deployment script will then monitor the HAProxy status
board until all existing requests dispatched are completed. The app server
will then be able to be restarted.

Previously the URL used was /+opstats and we have some exceptional rules for
that view. I've implemented the same exceptions for this as it should also not
talk to the DB and be excluded from the PPR reports.

Tests can be run with test -vvm canonical.launchpad.webapp.tests.test_haproxy
and canonical.launchpad.webapp.tests.test_sighup.

QA can be done by sedning the HUP signal to the app server, looking at the log
and visiting the /+haproxy url.

Let me know if you have any questions.
-- 
https://code.launchpad.net/~flacoste/launchpad/bug-688503/+merge/44102
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~flacoste/launchpad/bug-688503 into lp:launchpad.
=== modified file 'lib/canonical/launchpad/webapp/configure.zcml'
--- lib/canonical/launchpad/webapp/configure.zcml	2010-10-14 23:03:41 +0000
+++ lib/canonical/launchpad/webapp/configure.zcml	2010-12-17 21:21:17 +0000
@@ -331,6 +331,10 @@
     <!-- Signal handlers -->
     <subscriber
         for="zope.app.appsetup.IProcessStartingEvent"
+        handler="canonical.launchpad.webapp.sighup.setup_sighup"
+        />
+    <subscriber
+        for="zope.app.appsetup.IProcessStartingEvent"
         handler="canonical.launchpad.webapp.sigusr1.setup_sigusr1"
         />
     <subscriber
@@ -431,4 +435,11 @@
         factory="canonical.launchpad.webapp.namespace.FormNamespaceView"
         />
 
+    <!-- Registrations to support +haproxy status url. -->
+    <browser:page
+        for="canonical.launchpad.webapp.interfaces.ILaunchpadRoot"
+        name="+haproxy"
+        permission="zope.Public"
+        class="canonical.launchpad.webapp.haproxy.HAProxyStatusView"
+        />
 </configure>

=== modified file 'lib/canonical/launchpad/webapp/dbpolicy.py'
--- lib/canonical/launchpad/webapp/dbpolicy.py	2010-11-16 06:29:36 +0000
+++ lib/canonical/launchpad/webapp/dbpolicy.py	2010-12-17 21:21:17 +0000
@@ -193,13 +193,13 @@
 def LaunchpadDatabasePolicyFactory(request):
     """Return the Launchpad IDatabasePolicy for the current appserver state.
     """
-    # We need to select a non-load balancing DB policy for +opstats so
+    # We need to select a non-load balancing DB policy for some status URLs so
     # it doesn't query the DB for lag information (this page should not
     # hit the database at all). We haven't traversed yet, so we have
     # to sniff the request this way.  Even though PATH_INFO is always
     # present in real requests, we need to tread carefully (``get``) because
     # of test requests in our automated tests.
-    if request.get('PATH_INFO') == u'/+opstats':
+    if request.get('PATH_INFO') in [u'/+opstats', u'/+haproxy']:
         return DatabaseBlockedPolicy(request)
     elif is_read_only():
         return ReadOnlyLaunchpadDatabasePolicy(request)

=== added file 'lib/canonical/launchpad/webapp/haproxy.py'
--- lib/canonical/launchpad/webapp/haproxy.py	1970-01-01 00:00:00 +0000
+++ lib/canonical/launchpad/webapp/haproxy.py	2010-12-17 21:21:17 +0000
@@ -0,0 +1,63 @@
+# Copyright 2010 Canonical Ltd.  This software is licensed under the
+# GNU Affero General Public License version 3 (see the file LICENSE).
+
+"""Implementation of the HAProxy probe URL."""
+
+
+__metaclass__ = type
+__all__ = [
+    'HAProxyStatusView',
+    'set_going_down_flag',
+    'switch_going_down_flag',
+    ]
+
+# This is the global flag, when this is True, the HAProxy view
+# will return 500, it returns 200 otherwise.
+going_down_flag = False
+
+
+def set_going_down_flag(state):
+    """Sets the going_down_flag to state"""
+    global going_down_flag
+    going_down_flag = state
+
+
+def switch_going_down_flag():
+    """Switch the going down flag.
+
+    This is is registered as a signal handler for HUP.
+    """
+    global going_down_flag
+    going_down_flag = not going_down_flag
+
+
+class HAProxyStatusView:
+    """
+    View implementing the HAProxy probe URL.
+
+    HAProxy doesn't support programmatically taking servers in and our of
+    rotation. It does however uses a probe URL that it scans regularly to see
+    if the server is still alive. The /+haproxy is that URL for us.
+
+    If it times out or returns a non-200 status, it will take the server out
+    of rotation, until the probe works again.
+
+    This allow us to send a signal (HUP) to the app servers when we want to
+    restart them. The probe URL will then return 500 and the app server will
+    be taken out of rotation. Once HAProxy reports that all connections to the
+    app servers are finished, we can restart the server safely.
+    """
+
+    def __init__(self, context, request):
+        self.context = context
+        self.request = request
+
+    def __call__(self):
+        """Return 200 or 500 depending on the global flag."""
+        global going_down_flag
+        if going_down_flag:
+            self.request.response.setStatus(500)
+            return u"May day! May day! I'm going down. Stop the flood gate."
+        else:
+            self.request.response.setStatus(200)
+            return u"Everything is groovy. Keep them coming!"

=== modified file 'lib/canonical/launchpad/webapp/publication.py'
--- lib/canonical/launchpad/webapp/publication.py	2010-11-02 01:34:05 +0000
+++ lib/canonical/launchpad/webapp/publication.py	2010-12-17 21:21:17 +0000
@@ -456,9 +456,11 @@
         # And spit the pageid out to our tracelog.
         tracelog(request, 'p', pageid)
 
-        # For opstats, where we really don't want to have any DB access at all,
-        # ensure that all flag lookups will stop early.
-        if pageid in ('RootObject:OpStats', 'RootObject:+opstats'):
+        # For status URLs, where we really don't want to have any DB access
+        # at all, ensure that all flag lookups will stop early.
+        if pageid in (
+            'RootObject:OpStats', 'RootObject:+opstats',
+            'RootObject:+haproxy'):
             request.features = NullFeatureController()
             features.per_thread.features = request.features
 
@@ -466,8 +468,9 @@
         # to control the hard timeout, and they trigger DB access, but our
         # DB tracers are not safe for reentrant use, so we must do this
         # outside of the SQL stack. We must also do it after traversal so that
-        # the view is known and can be used in scope resolution. As we actually
-        # stash the pageid after afterTraversal, we need to do this even later.
+        # the view is known and can be used in scope resolution. As we
+        # actually stash the pageid after afterTraversal, we need to do this
+        # even later.
         da.set_permit_timeout_from_features(True)
         da._get_request_timeout()
 

=== added file 'lib/canonical/launchpad/webapp/sighup.py'
--- lib/canonical/launchpad/webapp/sighup.py	1970-01-01 00:00:00 +0000
+++ lib/canonical/launchpad/webapp/sighup.py	2010-12-17 21:21:17 +0000
@@ -0,0 +1,26 @@
+# Copyright 2010 Canonical Ltd.  This software is licensed under the
+# GNU Affero General Public License version 3 (see the file LICENSE).
+
+"""Signal handler for SIGHUP."""
+
+__metaclass__ = type
+__all__ = []
+
+import logging
+import signal
+
+from canonical.launchpad.webapp import haproxy
+
+
+def sighup_handler(signum, frame):
+    """Switch the state of the HAProxy going_down flag."""
+    haproxy.switch_going_down_flag()
+    logging.getLogger('sighup').info(
+        'Received SIGHUP, swiched going_down flag to %s' %
+        haproxy.going_down_flag)
+
+
+def setup_sighup(event):
+    """Configure the SIGHUP handler.  Called at startup."""
+    signal.signal(signal.SIGHUP, sighup_handler)
+

=== added file 'lib/canonical/launchpad/webapp/tests/test_haproxy.py'
--- lib/canonical/launchpad/webapp/tests/test_haproxy.py	1970-01-01 00:00:00 +0000
+++ lib/canonical/launchpad/webapp/tests/test_haproxy.py	2010-12-17 21:21:17 +0000
@@ -0,0 +1,49 @@
+# Copyright 2010 Canonical Ltd.  This software is licensed under the
+# GNU Affero General Public License version 3 (see the file LICENSE).
+
+"""Test the haproxy integration view."""
+
+__metaclass__ = type
+__all__ = []
+
+from canonical.testing.layers import FunctionalLayer
+from canonical.launchpad.webapp import haproxy
+from canonical.launchpad.webapp.dbpolicy import (
+    DatabaseBlockedPolicy,
+    LaunchpadDatabasePolicyFactory,
+    )
+from canonical.launchpad.webapp.servers import LaunchpadTestRequest
+
+from zope.app.testing.functional import HTTPCaller
+from lp.testing import TestCase
+
+
+class HAProxyIntegrationTest(TestCase):
+    layer = FunctionalLayer
+
+    def setUp(self):
+        TestCase.setUp(self)
+        self.http = HTTPCaller()
+        self.original_flag = haproxy.going_down_flag
+        self.addCleanup(haproxy.set_going_down_flag, self.original_flag)
+
+    def test_HAProxyStatusView_all_good_returns_200(self):
+        result = self.http(u'GET /+haproxy HTTP/1.0', handle_errors=False)
+        self.assertEquals(200, result.getStatus())
+
+    def test_HAProxyStatusView_going_down_returns_500(self):
+        haproxy.set_going_down_flag(True)
+        result = self.http(u'GET /+haproxy HTTP/1.0', handle_errors=False)
+        self.assertEquals(500, result.getStatus())
+
+    def test_haproxy_url_uses_DatabaseBlocked_policy(self):
+        request = LaunchpadTestRequest(environ={'PATH_INFO': '/+haproxy'})
+        policy = LaunchpadDatabasePolicyFactory(request)
+        self.assertIsInstance(policy, DatabaseBlockedPolicy)
+
+    def test_switch_going_down_flag(self):
+        haproxy.set_going_down_flag(True)
+        haproxy.switch_going_down_flag()
+        self.assertEquals(False, haproxy.going_down_flag)
+        haproxy.switch_going_down_flag()
+        self.assertEquals(True, haproxy.going_down_flag)

=== added file 'lib/canonical/launchpad/webapp/tests/test_sighup.py'
--- lib/canonical/launchpad/webapp/tests/test_sighup.py	1970-01-01 00:00:00 +0000
+++ lib/canonical/launchpad/webapp/tests/test_sighup.py	2010-12-17 21:21:17 +0000
@@ -0,0 +1,36 @@
+# Copyright 2010 Canonical Ltd.  This software is licensed under the
+# GNU Affero General Public License version 3 (see the file LICENSE).
+
+"""Test the SIGHUP signal handler."""
+
+__metaclass__ = type
+
+import os
+import signal
+
+from canonical.testing.layers import FunctionalLayer
+from canonical.launchpad.webapp import haproxy
+from canonical.launchpad.webapp import sighup
+from lp.testing import TestCase
+
+
+class SIGHUPTestCase(TestCase):
+    layer = FunctionalLayer
+
+    def setUp(self):
+        TestCase.setUp(self)
+        self.original_handler = signal.getsignal(signal.SIGHUP)
+        self.addCleanup(signal.signal, signal.SIGHUP, self.original_handler)
+        sighup.setup_sighup(None)
+
+        self.original_flag = haproxy.going_down_flag
+        self.addCleanup(haproxy.set_going_down_flag, self.original_flag)
+
+    def test_sighup(self):
+        # Sending SIGHUP should switch the PID
+        os.kill(os.getpid(), signal.SIGHUP)
+        self.assertEquals(not self.original_flag, haproxy.going_down_flag)
+
+        # Sending again should switch again.
+        os.kill(os.getpid(), signal.SIGHUP)
+        self.assertEquals(self.original_flag, haproxy.going_down_flag)

=== modified file 'utilities/page-performance-report.ini'
--- utilities/page-performance-report.ini	2010-11-05 19:58:32 +0000
+++ utilities/page-performance-report.ini	2010-12-17 21:21:17 +0000
@@ -3,7 +3,7 @@
 # Remeber to quote ?, ., + & ? characters to match litterally.
 # 'kodos' is useful for interactively testing regular expressions.
 All Launchpad=.
-All launchpad except opstats=(?<!\+opstats)$
+All launchpad except opstats=(?<!\+opstats|\+haproxy)$
 
 Launchpad Frontpage=^https?://launchpad\.[^/]+/?$