← Back to team overview

sts-sponsors team mailing list archive

[Bug 1739033] Re: Corosync: Assertion 'sender_node != NULL' failed when bind iface is ready after corosync boots

 

#VERIFICATION FOR XENIAL

- Packages
ii  corosync                         2.3.5-3ubuntu2                             amd64        cluster engine daemon and utilities
ii  libcorosync-common4:amd64        2.3.5-3ubuntu2                             amd64        cluster engine common library

- Reproducer
Using a config file with bad entries (as shown in the description)
ifdown interface
/usr/sbin/corosync -f
ifup interface

- Debug output:

Dec 22 11:22:01 xenial-corosync corosync[2742]:   [TOTEM ] totemudpu.c:408 sendmsg(mcast) failed (non-critical): Invalid argument (22)
Dec 22 11:22:02 xenial-corosync corosync[2742]: message repeated 14 times: [   [TOTEM ] totemudpu.c:408 sendmsg(mcast) failed (non-critical): Invalid argument (22)]
Dec 22 11:22:02 xenial-corosync corosync[2742]:   [TOTEM ] totemudpu.c:619 The network interface [169.254.241.20] is now up.
Dec 22 11:22:02 xenial-corosync corosync[2742]:   [TOTEM ] totemudpu.c:1125 adding new UDPU member {169.254.241.10}
Dec 22 11:22:02 xenial-corosync corosync[2742]:   [TOTEM ] totemudpu.c:1125 adding new UDPU member {169.254.241.20}
Dec 22 11:22:02 xenial-corosync corosync[2742]:   [TOTEM ] totemsrp.c:2175 entering GATHER state from 15(interface change).
Dec 22 11:22:05 xenial-corosync corosync[2742]:   [TOTEM ] totemsrp.c:2175 entering GATHER state from 0(consensus timeout).
Dec 22 11:22:05 xenial-corosync corosync[2742]:   [TOTEM ] totemsrp.c:3227 Creating commit token because I am the rep.
Dec 22 11:22:05 xenial-corosync corosync[2742]:   [TOTEM ] totemsrp.c:1591 Saving state aru 0 high seq received 0
Dec 22 11:22:06 xenial-corosync corosync[2742]:   [TOTEM ] totemsrp.c:2224 entering COMMIT state.
Dec 22 11:22:06 xenial-corosync corosync[2742]:   [TOTEM ] totemsrp.c:4571 got commit token
Dec 22 11:22:06 xenial-corosync corosync[2742]:   [TOTEM ] totemsrp.c:2261 entering RECOVERY state.
Dec 22 11:22:06 xenial-corosync corosync[2742]:   [TOTEM ] totemsrp.c:2307 position [0] member 169.254.241.20:
Dec 22 11:22:06 xenial-corosync corosync[2742]:   [TOTEM ] totemsrp.c:2311 previous ring seq 4c rep 127.0.0.1
Dec 22 11:22:06 xenial-corosync corosync[2742]:   [TOTEM ] totemsrp.c:2317 aru 0 high delivered 0 received flag 1
Dec 22 11:22:06 xenial-corosync corosync[2742]:   [TOTEM ] totemsrp.c:2415 Did not need to originate any messages in recovery.
Dec 22 11:22:06 xenial-corosync corosync[2742]:   [TOTEM ] totemsrp.c:4571 got commit token
Dec 22 11:22:06 xenial-corosync corosync[2742]:   [TOTEM ] totemsrp.c:4632 Sending initial ORF token
Dec 22 11:22:06 xenial-corosync corosync[2742]:   [TOTEM ] totemsrp.c:3828 token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 0, aru 0
Dec 22 11:22:06 xenial-corosync corosync[2742]:   [TOTEM ] totemsrp.c:3839 install seq 0 aru 0 high seq received 0
Dec 22 11:22:06 xenial-corosync corosync[2742]:   [TOTEM ] totemsrp.c:3828 token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 1, aru 0
Dec 22 11:22:06 xenial-corosync corosync[2742]:   [TOTEM ] totemsrp.c:3839 install seq 0 aru 0 high seq received 0
Dec 22 11:22:06 xenial-corosync corosync[2742]:   [TOTEM ] totemsrp.c:3828 token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 2, aru 0
Dec 22 11:22:06 xenial-corosync corosync[2742]:   [TOTEM ] totemsrp.c:3839 install seq 0 aru 0 high seq received 0
Dec 22 11:22:06 xenial-corosync corosync[2742]:   [TOTEM ] totemsrp.c:3828 token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 3, aru 0
Dec 22 11:22:06 xenial-corosync corosync[2742]:   [TOTEM ] totemsrp.c:3839 install seq 0 aru 0 high seq received 0
Dec 22 11:22:06 xenial-corosync corosync[2742]:   [TOTEM ] totemsrp.c:3858 retrans flag count 4 token aru 0 install seq 0 aru 0 0
Dec 22 11:22:06 xenial-corosync corosync[2742]:   [TOTEM ] totemsrp.c:1607 Resetting old ring state
Dec 22 11:22:06 xenial-corosync corosync[2742]:   [TOTEM ] totemsrp.c:1813 recovery to regular 1-0
Dec 22 11:22:06 xenial-corosync corosync[2742]:   [TOTEM ] totempg.c:286 waiting_trans_ack changed to 1
Dec 22 11:22:06 xenial-corosync corosync[2742]:   [TOTEM ] totemudpu.c:1232 Marking UDPU member 169.254.241.20 active
Dec 22 11:22:06 xenial-corosync corosync[2742]:   [TOTEM ] totemsrp.c:2089 entering OPERATIONAL state.
Dec 22 11:22:06 xenial-corosync corosync[2742]:   [TOTEM ] totemsrp.c:2095 A new membership (169.254.241.20:80) was formed. Members joined: 2

root@xenial-corosync:/home/vtapia# corosync-cfgtool -s                                                                                                                                             
Printing ring status.
Local node ID 2
RING ID 0
        id      = 169.254.241.20
        status  = ring 0 active with no faults

Corosync starts as expected.

-- 
You received this bug notification because you are a member of STS
Sponsors, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1739033

Title:
  Corosync: Assertion 'sender_node != NULL' failed when bind iface is
  ready after corosync boots

Status in corosync package in Ubuntu:
  Fix Released
Status in corosync source package in Trusty:
  Fix Committed
Status in corosync source package in Xenial:
  Fix Committed
Status in corosync source package in Zesty:
  Fix Released
Status in corosync source package in Artful:
  Fix Released

Bug description:
  [Impact]

  Corosync sigaborts if it starts before the interface it has to bind to
  is ready.

  On boot, if no interface in the bindnetaddr range is up/configured,
  corosync binds to lo (127.0.0.1). Once an applicable interface is up,
  corosync crashes with the following error message:

  corosync: votequorum.c:2019: message_handler_req_exec_votequorum_nodeinfo: Assertion `sender_node != NULL' failed.
  Aborted (core dumped)

  The last log entries show that the interface is trying to join the
  cluster:

  Dec 19 11:36:05 [22167] xenial-pacemaker corosync debug   [TOTEM ] totemsrp.c:2089 entering OPERATIONAL state.
  Dec 19 11:36:05 [22167] xenial-pacemaker corosync notice  [TOTEM ] totemsrp.c:2095 A new membership (169.254.241.10:444) was formed. Members joined: 704573706

  During the quorum calculation, the generated nodeid (704573706) for
  the node is being used instead of the nodeid specified in the
  configuration file (1), and the assert fails because the nodeid is not
  present in the member list. Corosync should use the correct nodeid and
  continue running after the interface is up, as shown in a fixed
  corosync boot:

  Dec 19 11:50:56 [4824] xenial-corosync corosync notice  [TOTEM ]
  totemsrp.c:2095 A new membership (169.254.241.10:80) was formed.
  Members joined: 1

  [Environment]

  Xenial 16.04.3

  Packages:

  ii  corosync                     2.3.5-3ubuntu1    amd64    cluster engine daemon and utilities
  ii  libcorosync-common4:amd64    2.3.5-3ubuntu1    amd64    cluster engine common library

  [Test Case]

  Config:

  totem {
          version: 2
          member {
                  memberaddr: 169.254.241.10
          }
          member {
                  memberaddr: 169.254.241.20
          }
          transport: udpu

          crypto_cipher: none
          crypto_hash: none
          nodeid: 1
          interface {
                  ringnumber: 0
                  bindnetaddr: 169.254.241.0
                  mcastport: 5405
                  ttl: 1
          }
  }

  quorum {
          provider: corosync_votequorum
          expected_votes: 2
  }

  nodelist {
          node {
                  ring0_addr: 169.254.241.10
                  nodeid: 1
          }
          node {
                  ring0_addr: 169.254.241.20
                  nodeid: 2
          }
  }

  1. ifdown interface (169.254.241.10)
  2. start corosync (/usr/sbin/corosync -f)
  3. ifup interface

  [Regression Potential]

  This patch affects corosync boot; the regression potential is for
  other problems during corosync startup and/or configuration parsing.

  [Other info]

  # Upstream corosync commit :
  https://github.com/corosync/corosync/commit/aab55a004bb12ebe78db341dc56759dfe710c1b2

  # git describe aab55a004bb12ebe78db341dc56759dfe710c1b2
  v2.3.5-45-gaab55a0

  # rmadison corosync
  corosync | 2.3.3-1ubuntu1   | trusty          | source, amd64, arm64, armhf, i386, powerpc, ppc64el
  corosync | 2.3.3-1ubuntu3   | trusty-updates  | source, amd64, arm64, armhf, i386, powerpc, ppc64el
  corosync | 2.3.5-3ubuntu1   | xenial          | source, amd64, arm64, armhf, i386, powerpc, ppc64el, s390x
  corosync | 2.4.2-3build1    | zesty           | source, amd64, arm64, armhf, i386, ppc64el, s390x
  corosync | 2.4.2-3build1    | artful          | source, amd64, arm64, armhf, i386, ppc64el, s390x
  corosync | 2.4.2-3build1    | bionic          | source, amd64, arm64, armhf, i386, ppc64el, s390x

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1739033/+subscriptions