← Back to team overview

sts-sponsors team mailing list archive

[Bug 1739033] Re: Corosync: Assertion 'sender_node != NULL' failed when bind iface is ready after corosync boots

 

#VERIFICATION FOR TRUSTY

- Packages
ii  corosync                         2.3.3-1ubuntu4                             amd64        Standards-based cluster framework (daemon and modules)
ii  libcorosync-common4              2.3.3-1ubuntu4                             amd64        Standards-based cluster framework, common library

- Reproducer
Using a config file with bad entries (as shown in the description)
ifdown interface
/usr/sbin/corosync -f
ifup interface

- Debug output:

Dec 22 12:18:27 trusty-corosync corosync[3910]:   [TOTEM ] entering GATHER state from 0(consensus timeout).
Dec 22 12:18:27 trusty-corosync corosync[3910]:   [TOTEM ] Creating commit token because I am the rep.
Dec 22 12:18:27 trusty-corosync corosync[3910]:   [TOTEM ] Saving state aru 0 high seq received 0
Dec 22 12:18:27 trusty-corosync corosync[3910]:   [TOTEM ] Storing new sequence id for ring 4
Dec 22 12:18:27 trusty-corosync corosync[3910]:   [TOTEM ] entering COMMIT state.
Dec 22 12:18:27 trusty-corosync corosync[3910]:   [TOTEM ] got commit token
Dec 22 12:18:27 trusty-corosync corosync[3910]:   [TOTEM ] entering RECOVERY state.
Dec 22 12:18:27 trusty-corosync corosync[3910]:   [TOTEM ] position [0] member 169.254.241.20:
Dec 22 12:18:27 trusty-corosync corosync[3910]:   [TOTEM ] previous ring seq 0 rep 127.0.0.1
Dec 22 12:18:27 trusty-corosync corosync[3910]:   [TOTEM ] aru 0 high delivered 0 received flag 1
Dec 22 12:18:27 trusty-corosync corosync[3910]:   [TOTEM ] Did not need to originate any messages in recovery.
Dec 22 12:18:27 trusty-corosync corosync[3910]:   [TOTEM ] got commit token
Dec 22 12:18:27 trusty-corosync corosync[3910]:   [TOTEM ] Sending initial ORF token
Dec 22 12:18:27 trusty-corosync corosync[3910]:   [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 0, aru 0
Dec 22 12:18:27 trusty-corosync corosync[3910]:   [TOTEM ] install seq 0 aru 0 high seq received 0
Dec 22 12:18:27 trusty-corosync corosync[3910]:   [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 1, aru 0
Dec 22 12:18:27 trusty-corosync corosync[3910]:   [TOTEM ] install seq 0 aru 0 high seq received 0
Dec 22 12:18:27 trusty-corosync corosync[3910]:   [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 2, aru 0
Dec 22 12:18:27 trusty-corosync corosync[3910]:   [TOTEM ] install seq 0 aru 0 high seq received 0
Dec 22 12:18:27 trusty-corosync corosync[3910]:   [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 3, aru 0
Dec 22 12:18:27 trusty-corosync corosync[3910]:   [TOTEM ] install seq 0 aru 0 high seq received 0
Dec 22 12:18:27 trusty-corosync corosync[3910]:   [TOTEM ] retrans flag count 4 token aru 0 install seq 0 aru 0 0
Dec 22 12:18:27 trusty-corosync corosync[3910]:   [TOTEM ] Resetting old ring state
Dec 22 12:18:27 trusty-corosync corosync[3910]:   [TOTEM ] recovery to regular 1-0
Dec 22 12:18:27 trusty-corosync corosync[3910]:   [TOTEM ] waiting_trans_ack changed to 1
Dec 22 12:18:27 trusty-corosync corosync[3910]:   [MAIN  ] Member joined: r(0) ip(169.254.241.20) 
Dec 22 12:18:27 trusty-corosync corosync[3910]:   [TOTEM ] entering OPERATIONAL state.
Dec 22 12:18:27 trusty-corosync corosync[3910]:   [TOTEM ] A new membership (169.254.241.20:4) was formed. Members joined: 1
Dec 22 12:18:27 trusty-corosync corosync[3910]:   [QUORUM] got nodeinfo message from cluster node 1
Dec 22 12:18:27 trusty-corosync corosync[3910]:   [QUORUM] nodeinfo message[1]: votes: 1, expected: 2 flags: 8

root@trusty-corosync:/home/vtapia# corosync-cfgtool -s                                                                                                                                             
Printing ring status.
Local node ID 1
RING ID 0
        id      = 169.254.241.20
        status  = ring 0 active with no faults

Corosync starts as expected.

-- 
You received this bug notification because you are a member of STS
Sponsors, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1739033

Title:
  Corosync: Assertion 'sender_node != NULL' failed when bind iface is
  ready after corosync boots

Status in corosync package in Ubuntu:
  Fix Released
Status in corosync source package in Trusty:
  Fix Committed
Status in corosync source package in Xenial:
  Fix Committed
Status in corosync source package in Zesty:
  Fix Released
Status in corosync source package in Artful:
  Fix Released

Bug description:
  [Impact]

  Corosync sigaborts if it starts before the interface it has to bind to
  is ready.

  On boot, if no interface in the bindnetaddr range is up/configured,
  corosync binds to lo (127.0.0.1). Once an applicable interface is up,
  corosync crashes with the following error message:

  corosync: votequorum.c:2019: message_handler_req_exec_votequorum_nodeinfo: Assertion `sender_node != NULL' failed.
  Aborted (core dumped)

  The last log entries show that the interface is trying to join the
  cluster:

  Dec 19 11:36:05 [22167] xenial-pacemaker corosync debug   [TOTEM ] totemsrp.c:2089 entering OPERATIONAL state.
  Dec 19 11:36:05 [22167] xenial-pacemaker corosync notice  [TOTEM ] totemsrp.c:2095 A new membership (169.254.241.10:444) was formed. Members joined: 704573706

  During the quorum calculation, the generated nodeid (704573706) for
  the node is being used instead of the nodeid specified in the
  configuration file (1), and the assert fails because the nodeid is not
  present in the member list. Corosync should use the correct nodeid and
  continue running after the interface is up, as shown in a fixed
  corosync boot:

  Dec 19 11:50:56 [4824] xenial-corosync corosync notice  [TOTEM ]
  totemsrp.c:2095 A new membership (169.254.241.10:80) was formed.
  Members joined: 1

  [Environment]

  Xenial 16.04.3

  Packages:

  ii  corosync                     2.3.5-3ubuntu1    amd64    cluster engine daemon and utilities
  ii  libcorosync-common4:amd64    2.3.5-3ubuntu1    amd64    cluster engine common library

  [Test Case]

  Config:

  totem {
          version: 2
          member {
                  memberaddr: 169.254.241.10
          }
          member {
                  memberaddr: 169.254.241.20
          }
          transport: udpu

          crypto_cipher: none
          crypto_hash: none
          nodeid: 1
          interface {
                  ringnumber: 0
                  bindnetaddr: 169.254.241.0
                  mcastport: 5405
                  ttl: 1
          }
  }

  quorum {
          provider: corosync_votequorum
          expected_votes: 2
  }

  nodelist {
          node {
                  ring0_addr: 169.254.241.10
                  nodeid: 1
          }
          node {
                  ring0_addr: 169.254.241.20
                  nodeid: 2
          }
  }

  1. ifdown interface (169.254.241.10)
  2. start corosync (/usr/sbin/corosync -f)
  3. ifup interface

  [Regression Potential]

  This patch affects corosync boot; the regression potential is for
  other problems during corosync startup and/or configuration parsing.

  [Other info]

  # Upstream corosync commit :
  https://github.com/corosync/corosync/commit/aab55a004bb12ebe78db341dc56759dfe710c1b2

  # git describe aab55a004bb12ebe78db341dc56759dfe710c1b2
  v2.3.5-45-gaab55a0

  # rmadison corosync
  corosync | 2.3.3-1ubuntu1   | trusty          | source, amd64, arm64, armhf, i386, powerpc, ppc64el
  corosync | 2.3.3-1ubuntu3   | trusty-updates  | source, amd64, arm64, armhf, i386, powerpc, ppc64el
  corosync | 2.3.5-3ubuntu1   | xenial          | source, amd64, arm64, armhf, i386, powerpc, ppc64el, s390x
  corosync | 2.4.2-3build1    | zesty           | source, amd64, arm64, armhf, i386, ppc64el, s390x
  corosync | 2.4.2-3build1    | artful          | source, amd64, arm64, armhf, i386, ppc64el, s390x
  corosync | 2.4.2-3build1    | bionic          | source, amd64, arm64, armhf, i386, ppc64el, s390x

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1739033/+subscriptions