launchpad-reviewers team mailing list archive
-
launchpad-reviewers team
-
Mailing list archive
-
Message #30399
[Merge] ~cjwatson/launchpad:charm-admin-pgkill into launchpad:master
Colin Watson has proposed merging ~cjwatson/launchpad:charm-admin-pgkill into launchpad:master.
Commit message:
charm/launchpad-admin: Run pgkillactive/pgkillidle
Requested reviews:
Launchpad code reviewers (launchpad-reviewers)
For more details, see:
https://code.launchpad.net/~cjwatson/launchpad/+git/launchpad/+merge/449194
On production, we run `pgkillactive` and `pgkillidle` to control connections that hang around for too long. (The `postgresql` charm has a similar facility, but it isn't fine-grained enough for us.) This currently runs on bare metal and needs to be moved into a charm. The `launchpad-admin` charm seems to be a reasonable choice: it has database admin credentials and a copy of the Launchpad tree, and we can easily set up a cron job exactly the way we want.
We have to do some regex-mangling to match `pgkillactive`'s current interface; I thought it better not to block moving this function into a charm on improving that interface.
--
Your team Launchpad code reviewers is requested to review the proposed merge of ~cjwatson/launchpad:charm-admin-pgkill into launchpad:master.
diff --git a/charm/launchpad-admin/config.yaml b/charm/launchpad-admin/config.yaml
new file mode 100644
index 0000000..055f4a8
--- /dev/null
+++ b/charm/launchpad-admin/config.yaml
@@ -0,0 +1,45 @@
+options:
+ active_appserver_transaction_seconds:
+ type: int
+ description: >
+ Number of seconds after which to kill active transactions matched by
+ `active_appserver_transaction_users`. (Transactions are killed by a
+ cron job, so this only sets a lower bound.)
+ default: 120
+ active_appserver_transaction_users:
+ type: string
+ description: >
+ YAML-formatted list of database user names whose transactions should
+ be killed after `active_appserver_transaction_seconds`.
+ default: |
+ - "juju_launchpad-appserver"
+ active_long_transaction_seconds:
+ type: int
+ description: >
+ Number of seconds after which to kill active transactions not matched
+ by `active_long_transaction_ignore_users`. (Transactions are killed
+ by a cron job, so this only sets a lower bound.)
+ # This timeout should ideally be a lot less (e.g. 15 minutes).
+ default: 12600
+ active_long_transaction_ignore_users:
+ type: string
+ description: >
+ YAML-formatted list of database user names whose transactions should
+ be left alone. Transactions from any other user names will be killed
+ after `active_long_transaction_seconds`.
+ default: |
+ - "postgres"
+ idle_transaction_seconds:
+ type: int
+ description: >
+ Number of seconds after which to kill idle transactions not matched by
+ `idle_transaction_ignore_users`. (Transactions are killed by a cron
+ job, so this only sets a lower bound.)
+ default: 1800
+ idle_transaction_ignore_users:
+ type: string
+ description: >
+ YAML-formatted list of database user names whose transactions should
+ be left alone. Transactions from any other user names will be killed
+ after idling in a transaction for `idle_transaction_seconds`.
+ default: "[]"
diff --git a/charm/launchpad-admin/reactive/launchpad-admin.py b/charm/launchpad-admin/reactive/launchpad-admin.py
index 0d3d311..5df1b29 100644
--- a/charm/launchpad-admin/reactive/launchpad-admin.py
+++ b/charm/launchpad-admin/reactive/launchpad-admin.py
@@ -2,12 +2,14 @@
# GNU Affero General Public License version 3 (see the file LICENSE).
import os.path
+import re
import subprocess
+import yaml
from charmhelpers.core import hookenv, host, templating
from charms.launchpad.base import configure_email, get_service_config
from charms.launchpad.db import strip_dsn_authentication, update_pgpass
-from charms.launchpad.payload import configure_lazr, home_dir
+from charms.launchpad.payload import configure_cron, configure_lazr, home_dir
from charms.reactive import (
endpoint_from_flag,
remove_state,
@@ -102,12 +104,32 @@ def configure():
config["db_session_primary"] = strip_password(session_db_primary)
config["db_session"] = strip_dsn_authentication(session_db_primary)
config["db_session_user"] = parse_dsn(session_db_primary)["user"]
+ # utilities/pgkillactive.py takes regexes for users whose connections it
+ # should kill, but we'd rather just have the operator of the charm
+ # supply lists of user names. Turn these into positive/negative regexes
+ # as appropriate.
+ config["active_appserver_transaction_user_regex"] = "^(%s)$" % "|".join(
+ re.escape(user)
+ for user in yaml.safe_load(
+ config["active_appserver_transaction_users"]
+ )
+ )
+ config["active_long_transaction_user_regex"] = "^(?!%s)" % "|".join(
+ re.escape(user) + "$"
+ for user in yaml.safe_load(
+ config["active_long_transaction_ignore_users"]
+ )
+ )
+ config["idle_transaction_ignore_users"] = yaml.safe_load(
+ config["idle_transaction_ignore_users"]
+ )
configure_lazr(
config,
"launchpad-admin-lazr.conf",
"launchpad-admin/launchpad-lazr.conf",
)
configure_email(config, "launchpad-admin")
+ configure_cron(config, "crontab.j2")
templating.render(
"bash_aliases.j2",
os.path.join(home_dir(), ".bash_aliases"),
diff --git a/charm/launchpad-admin/templates/crontab.j2 b/charm/launchpad-admin/templates/crontab.j2
new file mode 100644
index 0000000..f16afaf
--- /dev/null
+++ b/charm/launchpad-admin/templates/crontab.j2
@@ -0,0 +1,14 @@
+TZ=UTC
+MAILTO={{ cron_mailto }}
+
+# Kill connections idling in a transaction for too long.
+0,30 * * * * {{ code_dir }}/utilities/pgkillidle.py -s {{ idle_transaction_seconds }} -q {% for user in idle_transaction_ignore_users %}-i {{ user }} {% endfor %}-c '{{ db_admin_primary }}'
+
+# Kill appserver transactions lasting too long (appserver timeout is 65
+# seconds, so these indicate a hung appserver or request handler).
+* * * * * {{ code_dir }}/utilities/pgkillactive.py -q -s {{ active_appserver_transaction_seconds }} -u '{{ active_appserver_transaction_user_regex }}' -c '{{ db_admin_primary }}'
+
+# Kill non-system connections holding open a transaction for too long.
+# These are bugs needing to be fixed.
+15,45 * * * * {{ code_dir }}/utilities/pgkillactive.py -s {{ active_long_transaction_seconds }} -u '{{ active_long_transaction_user_regex }}' -q -c '{{ db_admin_primary }}'
+