← Back to team overview

launchpad-reviewers team mailing list archive

[Merge] ~cjwatson/launchpad:charm-admin-pgkill into launchpad:master

 

Colin Watson has proposed merging ~cjwatson/launchpad:charm-admin-pgkill into launchpad:master.

Commit message:
charm/launchpad-admin: Run pgkillactive/pgkillidle

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers)

For more details, see:
https://code.launchpad.net/~cjwatson/launchpad/+git/launchpad/+merge/449194

On production, we run `pgkillactive` and `pgkillidle` to control connections that hang around for too long.  (The `postgresql` charm has a similar facility, but it isn't fine-grained enough for us.)  This currently runs on bare metal and needs to be moved into a charm.  The `launchpad-admin` charm seems to be a reasonable choice: it has database admin credentials and a copy of the Launchpad tree, and we can easily set up a cron job exactly the way we want.

We have to do some regex-mangling to match `pgkillactive`'s current interface; I thought it better not to block moving this function into a charm on improving that interface.
-- 
Your team Launchpad code reviewers is requested to review the proposed merge of ~cjwatson/launchpad:charm-admin-pgkill into launchpad:master.
diff --git a/charm/launchpad-admin/config.yaml b/charm/launchpad-admin/config.yaml
new file mode 100644
index 0000000..055f4a8
--- /dev/null
+++ b/charm/launchpad-admin/config.yaml
@@ -0,0 +1,45 @@
+options:
+  active_appserver_transaction_seconds:
+    type: int
+    description: >
+      Number of seconds after which to kill active transactions matched by
+      `active_appserver_transaction_users`.  (Transactions are killed by a
+      cron job, so this only sets a lower bound.)
+    default: 120
+  active_appserver_transaction_users:
+    type: string
+    description: >
+      YAML-formatted list of database user names whose transactions should
+      be killed after `active_appserver_transaction_seconds`.
+    default: |
+      - "juju_launchpad-appserver"
+  active_long_transaction_seconds:
+    type: int
+    description: >
+      Number of seconds after which to kill active transactions not matched
+      by `active_long_transaction_ignore_users`.  (Transactions are killed
+      by a cron job, so this only sets a lower bound.)
+    # This timeout should ideally be a lot less (e.g. 15 minutes).
+    default: 12600
+  active_long_transaction_ignore_users:
+    type: string
+    description: >
+      YAML-formatted list of database user names whose transactions should
+      be left alone.  Transactions from any other user names will be killed
+      after `active_long_transaction_seconds`.
+    default: |
+      - "postgres"
+  idle_transaction_seconds:
+    type: int
+    description: >
+      Number of seconds after which to kill idle transactions not matched by
+      `idle_transaction_ignore_users`.  (Transactions are killed by a cron
+      job, so this only sets a lower bound.)
+    default: 1800
+  idle_transaction_ignore_users:
+    type: string
+    description: >
+      YAML-formatted list of database user names whose transactions should
+      be left alone.  Transactions from any other user names will be killed
+      after idling in a transaction for `idle_transaction_seconds`.
+    default: "[]"
diff --git a/charm/launchpad-admin/reactive/launchpad-admin.py b/charm/launchpad-admin/reactive/launchpad-admin.py
index 0d3d311..5df1b29 100644
--- a/charm/launchpad-admin/reactive/launchpad-admin.py
+++ b/charm/launchpad-admin/reactive/launchpad-admin.py
@@ -2,12 +2,14 @@
 # GNU Affero General Public License version 3 (see the file LICENSE).
 
 import os.path
+import re
 import subprocess
 
+import yaml
 from charmhelpers.core import hookenv, host, templating
 from charms.launchpad.base import configure_email, get_service_config
 from charms.launchpad.db import strip_dsn_authentication, update_pgpass
-from charms.launchpad.payload import configure_lazr, home_dir
+from charms.launchpad.payload import configure_cron, configure_lazr, home_dir
 from charms.reactive import (
     endpoint_from_flag,
     remove_state,
@@ -102,12 +104,32 @@ def configure():
     config["db_session_primary"] = strip_password(session_db_primary)
     config["db_session"] = strip_dsn_authentication(session_db_primary)
     config["db_session_user"] = parse_dsn(session_db_primary)["user"]
+    # utilities/pgkillactive.py takes regexes for users whose connections it
+    # should kill, but we'd rather just have the operator of the charm
+    # supply lists of user names.  Turn these into positive/negative regexes
+    # as appropriate.
+    config["active_appserver_transaction_user_regex"] = "^(%s)$" % "|".join(
+        re.escape(user)
+        for user in yaml.safe_load(
+            config["active_appserver_transaction_users"]
+        )
+    )
+    config["active_long_transaction_user_regex"] = "^(?!%s)" % "|".join(
+        re.escape(user) + "$"
+        for user in yaml.safe_load(
+            config["active_long_transaction_ignore_users"]
+        )
+    )
+    config["idle_transaction_ignore_users"] = yaml.safe_load(
+        config["idle_transaction_ignore_users"]
+    )
     configure_lazr(
         config,
         "launchpad-admin-lazr.conf",
         "launchpad-admin/launchpad-lazr.conf",
     )
     configure_email(config, "launchpad-admin")
+    configure_cron(config, "crontab.j2")
     templating.render(
         "bash_aliases.j2",
         os.path.join(home_dir(), ".bash_aliases"),
diff --git a/charm/launchpad-admin/templates/crontab.j2 b/charm/launchpad-admin/templates/crontab.j2
new file mode 100644
index 0000000..f16afaf
--- /dev/null
+++ b/charm/launchpad-admin/templates/crontab.j2
@@ -0,0 +1,14 @@
+TZ=UTC
+MAILTO={{ cron_mailto }}
+
+# Kill connections idling in a transaction for too long.
+0,30 * * * * {{ code_dir }}/utilities/pgkillidle.py -s {{ idle_transaction_seconds }} -q {% for user in idle_transaction_ignore_users %}-i {{ user }} {% endfor %}-c '{{ db_admin_primary }}'
+
+# Kill appserver transactions lasting too long (appserver timeout is 65
+# seconds, so these indicate a hung appserver or request handler).
+* * * * * {{ code_dir }}/utilities/pgkillactive.py -q -s {{ active_appserver_transaction_seconds }} -u '{{ active_appserver_transaction_user_regex }}' -c '{{ db_admin_primary }}'
+
+# Kill non-system connections holding open a transaction for too long.
+# These are bugs needing to be fixed.
+15,45 * * * * {{ code_dir }}/utilities/pgkillactive.py -s {{ active_long_transaction_seconds }} -u '{{ active_long_transaction_user_regex }}' -q -c '{{ db_admin_primary }}'
+