sts-sponsors team mailing list archive
-
sts-sponsors team
-
Mailing list archive
-
Message #00300
[Bug 1740892] Re: corosync upgrade on 2018-01-02 caused pacemaker to fail
My findings from today:
1) This situation has always existed on Trusty, afaict. Removing the
regression related tag.
2) There are 24 possible combinations to consider (some are by
definition green already, but I'm including them for completeness; and
some are not achievable) for each release: `service {start,stop,restart}
{corosync,pacemaker}` where each of corosync and pacemaker can begin in
one of {started,stopped}; 3 * 2 * 2 * 2 = 24.
3) For now, I'm ignoring the case of pacemaker configured to use
heartbeat, as that is not the default in the current Ubuntu release.
4) On Trusty, 6 of those combinations are not possible by default
(corosync stopped but pacemaker running).
5) On Trusty, the only failing situation I can provoke is `service
restart corosync` when corosync and pacemaker are running already. In
all other 17 cases, the expected result is obtained with existing
packages.
6) I have submitted an MP to the Ubuntu Server Git repository for
general review and submitted a build to a PPA at:
https://launchpad.net/~nacc/+archive/ubuntu/lp1740892/, which adds a
manual SysV start of pacemaker in corosync's SysV restart logic, if
pacemaker was running before corosync was restarted. I think this is the
least likely path to affect any existing configurations. In
particularly, this does not affect the corosync start path, which may or
may not have previously started pacemaker (that is a local configuration
decision, afaict).
7) In my investigation (this relates to xnox's and other's comments),
there is no SysV link between pacemaker and corosync. Instead, pacemaker
itself quits due to not finding corosync if it's not already started.
This is why the SysV do_stop routine for corosync ends up resulting in
pacemaker stopping.
--
You received this bug notification because you are a member of STS
Sponsors, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1740892
Title:
corosync upgrade on 2018-01-02 caused pacemaker to fail
Status in OpenStack hacluster charm:
Invalid
Status in corosync package in Ubuntu:
In Progress
Status in corosync source package in Trusty:
Confirmed
Status in corosync source package in Xenial:
Confirmed
Status in corosync source package in Artful:
Confirmed
Status in corosync source package in Bionic:
In Progress
Bug description:
During upgrades on 2018-01-02, corosync and it's libs were upgraded:
(from a trusty/mitaka cloud)
Upgrade: libcmap4:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4),
corosync:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4), libcfg6:amd64
(2.3.3-1ubuntu3, 2.3.3-1ubuntu4), libcpg4:amd64 (2.3.3-1ubuntu3,
2.3.3-1ubuntu4), libquorum5:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4),
libcorosync-common4:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4),
libsam4:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4), libvotequorum6:amd64
(2.3.3-1ubuntu3, 2.3.3-1ubuntu4), libtotem-pg5:amd64 (2.3.3-1ubuntu3,
2.3.3-1ubuntu4)
During this process, it appears that pacemaker service is restarted
and it errors:
syslog:Jan 2 16:09:33 juju-machine-0-lxc-4 pacemakerd[1994]: notice: crm_update_peer_state: pcmk_quorum_notification: Node juju-machine-1-lxc-3[1001] - state is now lost (was member)
syslog:Jan 2 16:09:34 juju-machine-0-lxc-4 pacemakerd[1994]: notice: crm_update_peer_state: pcmk_quorum_notification: Node juju-machine-1-lxc-3[1001] - state is now member (was lost)
syslog:Jan 2 16:14:32 juju-machine-0-lxc-4 pacemakerd[1994]: error: cfg_connection_destroy: Connection destroyed
syslog:Jan 2 16:14:32 juju-machine-0-lxc-4 pacemakerd[1994]: notice: pcmk_shutdown_worker: Shuting down Pacemaker
syslog:Jan 2 16:14:32 juju-machine-0-lxc-4 pacemakerd[1994]: notice: stop_child: Stopping crmd: Sent -15 to process 2050
syslog:Jan 2 16:14:32 juju-machine-0-lxc-4 pacemakerd[1994]: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
syslog:Jan 2 16:14:32 juju-machine-0-lxc-4 pacemakerd[1994]: error: mcp_cpg_destroy: Connection destroyed
Also affected xenial/ocata
To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-hacluster/+bug/1740892/+subscriptions