sts-sponsors team mailing list archive
-
sts-sponsors team
-
Mailing list archive
-
Message #00534
[Bug 1740892] Re: corosync upgrade on 2018-01-02 caused pacemaker to fail
[Artful (pre-sru)]
# dpkg -l | egrep "corosync|pacemaker"
ii corosync 2.4.2-3build1 amd64 cluster engine daemon and utilities
ii crmsh 2.3.2-1 amd64 CRM shell for the pacemaker cluster manager
ii libcorosync-common4:amd64 2.4.2-3build1 amd64 cluster engine common library
ii pacemaker 1.1.16-1ubuntu1 amd64 cluster resource manager
ii pacemaker-cli-utils 1.1.16-1ubuntu1 amd64 cluster resource manager command line utilities
ii pacemaker-common 1.1.16-1ubuntu1 all cluster resource manager common files
ii pacemaker-resource-agents 1.1.16-1ubuntu1 all cluster resource manager general resource agents
# systemctl status corosync | egrep -i "Active:|pid"
Active: active (running) since Mon 2018-02-26 15:23:37 UTC; 15min ago
Main PID: 8943 (corosync)
# systemctl status pacemaker | egrep -i "Active:|pid"
Active: active (running) since Mon 2018-02-26 15:23:39 UTC; 15min ago
Main PID: 9033 (pacemakerd)
# apt-get install corosync -y
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following package was automatically installed and is no longer required:
libfreetype6
Use 'apt autoremove' to remove it.
The following additional packages will be installed:
pacemaker
Suggested packages:
fence-agents
The following packages will be upgraded:
corosync pacemaker
2 upgraded, 0 newly installed, 0 to remove and 41 not upgraded.
Need to get 768 kB of archives.
After this operation, 11.3 kB of additional disk space will be used.
Get:1 http://ppa.launchpad.net/slashd/lp1740892/ubuntu artful/main amd64 pacemaker amd64 1.1.16-1ubuntu2 [389 kB]
Get:2 http://ppa.launchpad.net/slashd/lp1740892/ubuntu artful/main amd64 corosync amd64 2.4.2-3ubuntu0.17.10.1 [379 kB]
Fetched 768 kB in 1s (420 kB/s)
(Reading database ... 29268 files and directories currently installed.)
Preparing to unpack .../pacemaker_1.1.16-1ubuntu2_amd64.deb ...
Unpacking pacemaker (1.1.16-1ubuntu2) over (1.1.16-1ubuntu1) ...
Preparing to unpack .../corosync_2.4.2-3ubuntu0.17.10.1_amd64.deb ...
Unpacking corosync (2.4.2-3ubuntu0.17.10.1) over (2.4.2-3build1) ...
Processing triggers for ureadahead (0.100.0-20) ...
Processing triggers for systemd (234-2ubuntu12.1) ...
Setting up corosync (2.4.2-3ubuntu0.17.10.1) ...
Processing triggers for man-db (2.7.6.1-2) ...
Setting up pacemaker (1.1.16-1ubuntu2) ...
# dpkg -l | egrep "corosync|pacemaker"
ii corosync 2.4.2-3ubuntu0.17.10.1 amd64 cluster engine daemon and utilities
ii crmsh 2.3.2-1 amd64 CRM shell for the pacemaker cluster manager
ii libcorosync-common4:amd64 2.4.2-3build1 amd64 cluster engine common library
ii pacemaker 1.1.16-1ubuntu2 amd64 cluster resource manager
ii pacemaker-cli-utils 1.1.16-1ubuntu1 amd64 cluster resource manager command line utilities
ii pacemaker-common 1.1.16-1ubuntu1 all cluster resource manager common files
ii pacemaker-resource-agents 1.1.16-1ubuntu1 all cluster resource manager general resource agents
# systemctl status corosync | egrep -i "Active:|pid"
Active: active (running) since Mon 2018-02-26 15:40:04 UTC; 13s ago
Main PID: 9814 (corosync)
# systemctl status pacemaker | egrep -i "Active:|pid"
Active: active (running) since Mon 2018-02-26 15:40:05 UTC; 14s ago
Main PID: 9996 (pacemakerd)
--
You received this bug notification because you are a member of STS
Sponsors, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1740892
Title:
corosync upgrade on 2018-01-02 caused pacemaker to fail
Status in OpenStack hacluster charm:
Invalid
Status in corosync package in Ubuntu:
Fix Released
Status in pacemaker package in Ubuntu:
Fix Released
Status in corosync source package in Trusty:
Won't Fix
Status in pacemaker source package in Trusty:
Won't Fix
Status in corosync source package in Xenial:
In Progress
Status in pacemaker source package in Xenial:
In Progress
Status in corosync source package in Artful:
In Progress
Status in pacemaker source package in Artful:
In Progress
Status in corosync source package in Bionic:
Fix Released
Status in corosync package in Debian:
New
Bug description:
[Impact]
When corosync and pacemaker are both installed, a corosync upgrade
caused pacemaker to fail. pacemaker will need to be restarted manually
to work again, it won't recover by itself.
[Test Case]
1) Have corosync (< 2.3.5-3ubuntu2) and pacemaker (< 1.1.14-2ubuntu1.3) installed
2) Make sure corosync & pacemaker are running via systemctl status cmd.
3) Upgrade corosync
4) Look corosync and pacemaker via systemctl status cmd again.
You will notice pacemaker is dead (inactive) and doesn't recover,
unless a systemctl start pacemaker is done manually.
[Regression Potential]
Regression potential is low, it doesn't change corosync/pacemaker core
functionality. This patch make sure thing goes smoother at the
packaging level during a corosync upgrade where pacemaker is
installed/involved.
This can also be useful in particular in situation where the system
has "unattended-upgrades" enable (software upgrades without
supervision), and no sysadmin available to start pacemaker manually
because this isn't a schedule maintenance.
[Other Info]
XENIAL Merge-proposal:
https://code.launchpad.net/~nacc/ubuntu/+source/corosync/+git/corosync/+merge/336338
https://code.launchpad.net/~nacc/ubuntu/+source/pacemaker/+git/pacemaker/+merge/336339
[Original Description]
During upgrades on 2018-01-02, corosync and it's libs were upgraded:
(from a trusty/mitaka cloud)
Upgrade: libcmap4:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4),
corosync:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4), libcfg6:amd64
(2.3.3-1ubuntu3, 2.3.3-1ubuntu4), libcpg4:amd64 (2.3.3-1ubuntu3,
2.3.3-1ubuntu4), libquorum5:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4),
libcorosync-common4:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4),
libsam4:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4), libvotequorum6:amd64
(2.3.3-1ubuntu3, 2.3.3-1ubuntu4), libtotem-pg5:amd64 (2.3.3-1ubuntu3,
2.3.3-1ubuntu4)
During this process, it appears that pacemaker service is restarted
and it errors:
syslog:Jan 2 16:09:33 juju-machine-0-lxc-4 pacemakerd[1994]: notice: crm_update_peer_state: pcmk_quorum_notification: Node juju-machine-1-lxc-3[1001] - state is now lost (was member)
syslog:Jan 2 16:09:34 juju-machine-0-lxc-4 pacemakerd[1994]: notice: crm_update_peer_state: pcmk_quorum_notification: Node juju-machine-1-lxc-3[1001] - state is now member (was lost)
syslog:Jan 2 16:14:32 juju-machine-0-lxc-4 pacemakerd[1994]: error: cfg_connection_destroy: Connection destroyed
syslog:Jan 2 16:14:32 juju-machine-0-lxc-4 pacemakerd[1994]: notice: pcmk_shutdown_worker: Shuting down Pacemaker
syslog:Jan 2 16:14:32 juju-machine-0-lxc-4 pacemakerd[1994]: notice: stop_child: Stopping crmd: Sent -15 to process 2050
syslog:Jan 2 16:14:32 juju-machine-0-lxc-4 pacemakerd[1994]: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
syslog:Jan 2 16:14:32 juju-machine-0-lxc-4 pacemakerd[1994]: error: mcp_cpg_destroy: Connection destroyed
Also affected xenial/ocata
To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-hacluster/+bug/1740892/+subscriptions