← Back to team overview

sts-sponsors team mailing list archive

[Bug 1740892] Re: corosync upgrade on 2018-01-02 caused pacemaker to fail

 

[VERIFICATION ARTFUL]

Upgrade went well, and have restarted pacemaker on a corosync
installation as it should.

# systemctl status corosync
   Active: active (running) since Thu 2018-03-01 15:18:08 UTC; 2min 37s ago
 Main PID: 2366 (corosync)

systemctl status pacemaker
   Active: active (running) since Thu 2018-03-01 15:18:10 UTC; 2min 44s ago
 Main PID: 2456 (pacemakerd)

# apt-get install corosync -y
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following package was automatically installed and is no longer required:
  libfreetype6
Use 'apt autoremove' to remove it.
The following additional packages will be installed:
  pacemaker
Suggested packages:
  fence-agents
The following packages will be upgraded:
  corosync pacemaker
2 upgraded, 0 newly installed, 0 to remove and 36 not upgraded.
Need to get 486 kB of archives.
After this operation, 11.3 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu artful-proposed/main amd64 pacemaker amd64 1.1.17+really1.1.16-1ubuntu2 [314 kB]
Get:2 http://archive.ubuntu.com/ubuntu artful-proposed/main amd64 corosync amd64 2.4.2-3ubuntu0.17.10.1 [172 kB]
Fetched 486 kB in 2s (239 kB/s)  
(Reading database ... 29280 files and directories currently installed.)
Preparing to unpack .../pacemaker_1.1.17+really1.1.16-1ubuntu2_amd64.deb ...
Unpacking pacemaker (1.1.17+really1.1.16-1ubuntu2) over (1.1.16-1ubuntu1) ...
Preparing to unpack .../corosync_2.4.2-3ubuntu0.17.10.1_amd64.deb ...
Unpacking corosync (2.4.2-3ubuntu0.17.10.1) over (2.4.2-3build1) ...
Processing triggers for ureadahead (0.100.0-20) ...
Processing triggers for systemd (234-2ubuntu12.1) ...
Setting up corosync (2.4.2-3ubuntu0.17.10.1) ...
Processing triggers for man-db (2.7.6.1-2) ...
Setting up pacemaker (1.1.17+really1.1.16-1ubuntu2) ...

# systemctl status corosync | egrep "Active:|PID"
   Active: active (running) since Thu 2018-03-01 15:21:05 UTC; 17s ago
 Main PID: 3091 (corosync)

# systemctl status pacemaker | egrep "Active:|PID"
   Active: active (running) since Thu 2018-03-01 15:21:05 UTC; 21s ago
 Main PID: 3273 (pacemakerd)

** Tags removed: verification-needed-artful
** Tags added: verification-done-artful

-- 
You received this bug notification because you are a member of STS
Sponsors, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1740892

Title:
  corosync upgrade on 2018-01-02 caused pacemaker to fail

Status in OpenStack hacluster charm:
  Invalid
Status in corosync package in Ubuntu:
  Fix Released
Status in pacemaker package in Ubuntu:
  Fix Released
Status in corosync source package in Trusty:
  Won't Fix
Status in pacemaker source package in Trusty:
  Won't Fix
Status in corosync source package in Xenial:
  Fix Committed
Status in pacemaker source package in Xenial:
  Fix Committed
Status in corosync source package in Artful:
  Fix Committed
Status in pacemaker source package in Artful:
  Fix Committed
Status in corosync source package in Bionic:
  Fix Released
Status in corosync package in Debian:
  New

Bug description:
  [Impact]

  When corosync and pacemaker are both installed, a corosync upgrade
  caused pacemaker to fail. pacemaker will need to be restarted manually
  to work again, it won't recover by itself.

  [Test Case]

  1) Have corosync (< 2.3.5-3ubuntu2) and pacemaker (< 1.1.14-2ubuntu1.3) installed
  2) Make sure corosync & pacemaker are running via systemctl status cmd.
  3) Upgrade corosync
  4) Look corosync and pacemaker via systemctl status cmd again.

  You will notice pacemaker is dead (inactive) and doesn't recover,
  unless a systemctl start pacemaker is done manually.

  [Regression Potential]

  Regression potential is low, it doesn't change corosync/pacemaker core
  functionality. This patch make sure thing goes smoother at the
  packaging level during a corosync upgrade where pacemaker is
  installed/involved.

  This can also be useful in particular in situation where the system
  has "unattended-upgrades" enable (software upgrades without
  supervision), and no sysadmin available to start pacemaker manually
  because this isn't a schedule maintenance.

  For the symbol tag change in Artful to (optional), please refer
  yourself to comment #60 from slangasek.

  For the asctime change in Artful, please refer yourself to comment #51
  & comment #52.

  Note that both Artful changes in pacemaker above are only necessary
  for the package to build (even as-is without this patch). They aren't
  a requirement for the patch the work, but for the src pkg to build.

  [Other Info]

  XENIAL Merge-proposal:
  https://code.launchpad.net/~nacc/ubuntu/+source/corosync/+git/corosync/+merge/336338
  https://code.launchpad.net/~nacc/ubuntu/+source/pacemaker/+git/pacemaker/+merge/336339

  [Original Description]

  During upgrades on 2018-01-02, corosync and it's libs were upgraded:

  (from a trusty/mitaka cloud)

  Upgrade: libcmap4:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4),
  corosync:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4), libcfg6:amd64
  (2.3.3-1ubuntu3, 2.3.3-1ubuntu4), libcpg4:amd64 (2.3.3-1ubuntu3,
  2.3.3-1ubuntu4), libquorum5:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4),
  libcorosync-common4:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4),
  libsam4:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4), libvotequorum6:amd64
  (2.3.3-1ubuntu3, 2.3.3-1ubuntu4), libtotem-pg5:amd64 (2.3.3-1ubuntu3,
  2.3.3-1ubuntu4)

  During this process, it appears that pacemaker service is restarted
  and it errors:

  syslog:Jan  2 16:09:33 juju-machine-0-lxc-4 pacemakerd[1994]:   notice: crm_update_peer_state: pcmk_quorum_notification: Node juju-machine-1-lxc-3[1001] - state is now lost (was member)
  syslog:Jan  2 16:09:34 juju-machine-0-lxc-4 pacemakerd[1994]:   notice: crm_update_peer_state: pcmk_quorum_notification: Node juju-machine-1-lxc-3[1001] - state is now member (was lost)
  syslog:Jan  2 16:14:32 juju-machine-0-lxc-4 pacemakerd[1994]:    error: cfg_connection_destroy: Connection destroyed
  syslog:Jan  2 16:14:32 juju-machine-0-lxc-4 pacemakerd[1994]:   notice: pcmk_shutdown_worker: Shuting down Pacemaker
  syslog:Jan  2 16:14:32 juju-machine-0-lxc-4 pacemakerd[1994]:   notice: stop_child: Stopping crmd: Sent -15 to process 2050
  syslog:Jan  2 16:14:32 juju-machine-0-lxc-4 pacemakerd[1994]:    error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
  syslog:Jan  2 16:14:32 juju-machine-0-lxc-4 pacemakerd[1994]:    error: mcp_cpg_destroy: Connection destroyed

  Also affected xenial/ocata

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-hacluster/+bug/1740892/+subscriptions