sts-sponsors team mailing list archive
-
sts-sponsors team
-
Mailing list archive
-
Message #00327
[Bug 1740892] Re: corosync upgrade on 2018-01-02 caused pacemaker to fail
[VERIFICATION for XENIAL]
---------------------------------------------
[UPGRADE SCENARIO]
# dpkg -l | egrep "corosync|pacemaker"
ii corosync 2.3.5-3ubuntu2 amd64 cluster engine daemon and utilities
ii crmsh 2.2.0-1 amd64 CRM shell for the pacemaker cluster manager
ii libcorosync-common4:amd64 2.3.5-3ubuntu2 amd64 cluster engine common library
ii pacemaker 1.1.14-2ubuntu1.3 amd64 cluster resource manager
ii pacemaker-cli-utils 1.1.14-2ubuntu1.3 amd64 cluster resource manager command line utilities
ii pacemaker-common 1.1.14-2ubuntu1.3 all cluster resource manager common files
ii pacemaker-resource-agents 1.1.14-2ubuntu1.3 all cluster resource manager general resource agents
# pidof pacemakerd
3647
# pidof corosync
1283
# sudo add-apt-repository ppa:nacc/lp1740892
# sudo apt-get update
# apt-get install corosync -y
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following package was automatically installed and is no longer required:
libfreetype6
Use 'apt autoremove' to remove it.
The following additional packages will be installed:
pacemaker
Suggested packages:
fence-agents
The following packages will be upgraded:
corosync pacemaker
2 upgraded, 0 newly installed, 0 to remove and 43 not upgraded.
Need to get 765 kB of archives.
After this operation, 1024 B of additional disk space will be used.
Get:1 http://ppa.launchpad.net/nacc/lp1740892/ubuntu xenial/main amd64 pacemaker amd64 1.1.14-2ubuntu1.4~ppa3 [403 kB]
Get:2 http://ppa.launchpad.net/nacc/lp1740892/ubuntu xenial/main amd64 corosync amd64 2.3.5-3ubuntu2.1~ppa3 [361 kB]
Fetched 765 kB in 1s (488 kB/s)
(Reading database ... 28089 files and directories currently installed.)
Preparing to unpack .../pacemaker_1.1.14-2ubuntu1.4~ppa3_amd64.deb ...
Unpacking pacemaker (1.1.14-2ubuntu1.4~ppa3) over (1.1.14-2ubuntu1.3) ...
Preparing to unpack .../corosync_2.3.5-3ubuntu2.1~ppa3_amd64.deb ...
Unpacking corosync (2.3.5-3ubuntu2.1~ppa3) over (2.3.5-3ubuntu2) ...
Processing triggers for systemd (229-4ubuntu21) ...
Processing triggers for ureadahead (0.100.0-19) ...
Processing triggers for man-db (2.7.5-1) ...
Setting up corosync (2.3.5-3ubuntu2.1~ppa3) ...
Setting up pacemaker (1.1.14-2ubuntu1.4~ppa3) ...
# dpkg -l | egrep "corosync|pacemaker"
ii corosync 2.3.5-3ubuntu2.1~ppa3 amd64 cluster engine daemon and utilities
ii crmsh 2.2.0-1 amd64 CRM shell for the pacemaker cluster manager
ii libcorosync-common4:amd64 2.3.5-3ubuntu2 amd64 cluster engine common library
ii pacemaker 1.1.14-2ubuntu1.4~ppa3 amd64 cluster resource manager
ii pacemaker-cli-utils 1.1.14-2ubuntu1.3 amd64 cluster resource manager command line utilities
ii pacemaker-common 1.1.14-2ubuntu1.3 all cluster resource manager common files
ii pacemaker-resource-agents 1.1.14-2ubuntu1.3 all cluster resource manager general resource agents
# pidof corosync
4876
# pidof pacemakerd
4951
*** Result : The upgrade scenario make sure pacemaker is restarted after
the upgrade.
---------------------------------------------
[NEW INSTALL SCENARIO]
lgtm +1
# sudo add-apt-repository ppa:nacc/lp1740892
# sudo apt-get update
# sudo apt-get install corosync pacemaker -y
# dpkg -l | egrep "corosync|pacemaker"
ii corosync 2.3.5-3ubuntu2.1~ppa3 amd64 cluster engine daemon and utilities
ii crmsh 2.2.0-1 amd64 CRM shell for the pacemaker cluster manager
ii libcorosync-common4:amd64 2.3.5-3ubuntu2.1~ppa3 amd64 cluster engine common library
ii pacemaker 1.1.14-2ubuntu1.4~ppa3 amd64 cluster resource manager
ii pacemaker-cli-utils 1.1.14-2ubuntu1.4~ppa3 amd64 cluster resource manager command line utilities
ii pacemaker-common 1.1.14-2ubuntu1.4~ppa3 all cluster resource manager common files
ii pacemaker-resource-agents 1.1.14-2ubuntu1.4~ppa3 all cluster resource manager general resource agents
*** Result: No problem observed when both packages are installed for the
first time in Xenial.
---------------------------------------------
[STOP]
# systemctl stop corosync
# systemctl status pacemaker
● pacemaker.service - Pacemaker High Availability Cluster Manager
Loaded: loaded (/lib/systemd/system/pacemaker.service; enabled; vendor preset: enabled)
Active: inactive (dead) since Mon 2018-01-22 15:01:43 UTC; 1s ago
Main PID: 3417 (code=exited, status=0/SUCCESS)
# systemctl status corosync
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
Active: inactive (dead) since Mon 2018-01-22 15:01:43 UTC; 5s ago
Main PID: 3324 (code=exited, status=0/SUCCESS)
*** Resuilt: Stop action stopped both corosync & pacemaker.
---------------------------------------------
[START]
# systemctl start corosync
# systemctl status corosync
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2018-01-22 15:08:36 UTC; 26s ago
Main PID: 416 (corosync)
Tasks: 2
Memory: 10.7M
CPU: 258ms
CGroup: /system.slice/corosync.service
└─416 /usr/sbin/corosync -f
# systemctl status pacemaker
● pacemaker.service - Pacemaker High Availability Cluster Manager
Loaded: loaded (/lib/systemd/system/pacemaker.service; enabled; vendor preset: enabled)
Active: inactive (dead) since Mon 2018-01-22 15:08:20 UTC; 1min 7s ago
Process: 394 ExecStart=/usr/sbin/pacemakerd -f (code=exited, status=0/SUCCESS)
Main PID: 394 (code=exited, status=0/SUCCESS)
*** Resuilt: Start action DOESN'T start back pacemaker.
---------------------------------------------
[RESTART]
# systemctl status corosync
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2018-01-22 15:08:36 UTC; 4min 20s ago
Main PID: 416 (corosync)
# systemctl status pacemaker
● pacemaker.service - Pacemaker High Availability Cluster Manager
Loaded: loaded (/lib/systemd/system/pacemaker.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2018-01-22 15:12:51 UTC; 7s ago
Main PID: 457 (pacemakerd)
# systemctl restart corosync
# systemctl status corosync
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2018-01-22 15:13:14 UTC; 15s ago
Main PID: 475 (corosync)
# systemctl status pacemaker
● pacemaker.service - Pacemaker High Availability Cluster Manager
Loaded: loaded (/lib/systemd/system/pacemaker.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2018-01-22 15:13:14 UTC; 5s ago
Main PID: 477 (pacemakerd)
*** Result: restart action restarted both corosync & pacemaker.
--
You received this bug notification because you are a member of STS
Sponsors, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1740892
Title:
corosync upgrade on 2018-01-02 caused pacemaker to fail
Status in OpenStack hacluster charm:
Invalid
Status in corosync package in Ubuntu:
In Progress
Status in corosync source package in Trusty:
Confirmed
Status in corosync source package in Xenial:
Confirmed
Status in corosync source package in Artful:
Confirmed
Status in corosync source package in Bionic:
In Progress
Status in corosync package in Debian:
New
Bug description:
During upgrades on 2018-01-02, corosync and it's libs were upgraded:
(from a trusty/mitaka cloud)
Upgrade: libcmap4:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4),
corosync:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4), libcfg6:amd64
(2.3.3-1ubuntu3, 2.3.3-1ubuntu4), libcpg4:amd64 (2.3.3-1ubuntu3,
2.3.3-1ubuntu4), libquorum5:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4),
libcorosync-common4:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4),
libsam4:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4), libvotequorum6:amd64
(2.3.3-1ubuntu3, 2.3.3-1ubuntu4), libtotem-pg5:amd64 (2.3.3-1ubuntu3,
2.3.3-1ubuntu4)
During this process, it appears that pacemaker service is restarted
and it errors:
syslog:Jan 2 16:09:33 juju-machine-0-lxc-4 pacemakerd[1994]: notice: crm_update_peer_state: pcmk_quorum_notification: Node juju-machine-1-lxc-3[1001] - state is now lost (was member)
syslog:Jan 2 16:09:34 juju-machine-0-lxc-4 pacemakerd[1994]: notice: crm_update_peer_state: pcmk_quorum_notification: Node juju-machine-1-lxc-3[1001] - state is now member (was lost)
syslog:Jan 2 16:14:32 juju-machine-0-lxc-4 pacemakerd[1994]: error: cfg_connection_destroy: Connection destroyed
syslog:Jan 2 16:14:32 juju-machine-0-lxc-4 pacemakerd[1994]: notice: pcmk_shutdown_worker: Shuting down Pacemaker
syslog:Jan 2 16:14:32 juju-machine-0-lxc-4 pacemakerd[1994]: notice: stop_child: Stopping crmd: Sent -15 to process 2050
syslog:Jan 2 16:14:32 juju-machine-0-lxc-4 pacemakerd[1994]: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
syslog:Jan 2 16:14:32 juju-machine-0-lxc-4 pacemakerd[1994]: error: mcp_cpg_destroy: Connection destroyed
Also affected xenial/ocata
To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-hacluster/+bug/1740892/+subscriptions