touch-packages team mailing list archive
-
touch-packages team
-
Mailing list archive
-
Message #99735
[Bug 1439649] Re: Pacemaker unable to communicate with corosync on restart under lxc
After trying several corosync/pacemaker restarts without luck,
I was able to workaround this by adding an 'uidgid'
entry for hacluster:haclient:
* from /var/log/syslog:
Aug 31 18:33:18 juju-machine-3-lxc-3 corosync[901082]: [MAIN ] Denied connection attempt from 108:113
$ getent passwd 108
hacluster:x:108:113::/var/lib/heartbeat:/bin/false
$ getent group 113
haclient:x:113:
* add uidgid config:
# echo $'uidgid {\n uid: hacluster\n gid: haclient\n}' > /etc/corosync/uidgid.d/hacluster
* restart => Ok (crm status, etc)
I can't explain why other units are working ok without
this ACL addition (racing at service setup/start?).
--
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1439649
Title:
Pacemaker unable to communicate with corosync on restart under lxc
Status in lxc package in Ubuntu:
Confirmed
Status in pacemaker package in Ubuntu:
Confirmed
Bug description:
We've seen this a few times with three node clusters, all running in
LXC containers; pacemaker fails to restart correctly as it can't
communicate with corosync, resulting in a down cluster. Rebooting the
containers resolves the issue, so suspect some sort of bad state
either in corosync or pacemaker.
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: mcp_read_config: Configured corosync to accept connections from group 115: Library error (2)
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: main: Starting Pacemaker 1.1.10 (Build: 42f2063): generated-manpages agent-manpages ncurses libqb-logging libqb-ipc lha-fencing upstart nagios heartbeat corosync-native snmp libesmtp
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: cluster_connect_quorum: Quorum acquired
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: corosync_node_name: Unable to get node name for nodeid 1000
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: corosync_node_name: Unable to get node name for nodeid 1001
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: corosync_node_name: Unable to get node name for nodeid 1003
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: corosync_node_name: Unable to get node name for nodeid 1001
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: crm_update_peer_state: pcmk_quorum_notification: Node juju-machine-4-lxc-4[1001] - state is now member (was (null))
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: corosync_node_name: Unable to get node name for nodeid 1003
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: crm_update_peer_state: pcmk_quorum_notification: Node (null)[1003] - state is now member (was (null))
Apr 2 11:41:32 juju-machine-4-lxc-4 crmd[1033748]: notice: main: CRM Git Version: 42f2063
Apr 2 11:41:32 juju-machine-4-lxc-4 stonith-ng[1033744]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
Apr 2 11:41:32 juju-machine-4-lxc-4 stonith-ng[1033744]: notice: corosync_node_name: Unable to get node name for nodeid 1001
Apr 2 11:41:32 juju-machine-4-lxc-4 stonith-ng[1033744]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
Apr 2 11:41:32 juju-machine-4-lxc-4 attrd[1033746]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
Apr 2 11:41:32 juju-machine-4-lxc-4 corosync[1033732]: [MAIN ] Denied connection attempt from 109:115
Apr 2 11:41:32 juju-machine-4-lxc-4 corosync[1033732]: [QB ] Invalid IPC credentials (1033732-1033746).
Apr 2 11:41:32 juju-machine-4-lxc-4 attrd[1033746]: error: cluster_connect_cpg: Could not connect to the Cluster Process Group API: 11
Apr 2 11:41:32 juju-machine-4-lxc-4 attrd[1033746]: error: main: HA Signon failed
Apr 2 11:41:32 juju-machine-4-lxc-4 attrd[1033746]: error: main: Aborting startup
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: error: pcmk_child_exit: Child process attrd (1033746) exited: Network is down (100)
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: warning: pcmk_child_exit: Pacemaker child process attrd no longer wishes to be respawned. Shutting ourselves down.
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: pcmk_shutdown_worker: Shuting down Pacemaker
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: stop_child: Stopping crmd: Sent -15 to process 1033748
Apr 2 11:41:32 juju-machine-4-lxc-4 crmd[1033748]: warning: do_cib_control: Couldn't complete CIB registration 1 times... pause and retry
Apr 2 11:41:32 juju-machine-4-lxc-4 crmd[1033748]: notice: crm_shutdown: Requesting shutdown, upper limit is 1200000ms
Apr 2 11:41:32 juju-machine-4-lxc-4 crmd[1033748]: warning: do_log: FSA: Input I_SHUTDOWN from crm_shutdown() received in state S_STARTING
Apr 2 11:41:32 juju-machine-4-lxc-4 crmd[1033748]: notice: do_state_transition: State transition S_STARTING -> S_STOPPING [ input=I_SHUTDOWN cause=C_SHUTDOWN origin=crm_shutdown ]
Apr 2 11:41:32 juju-machine-4-lxc-4 cib[1033743]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
Apr 2 11:41:32 juju-machine-4-lxc-4 crmd[1033748]: notice: terminate_cs_connection: Disconnecting from Corosync
Apr 2 11:41:32 juju-machine-4-lxc-4 corosync[1033732]: [MAIN ] Denied connection attempt from 109:115
Apr 2 11:41:32 juju-machine-4-lxc-4 corosync[1033732]: [QB ] Invalid IPC credentials (1033732-1033743).
Apr 2 11:41:32 juju-machine-4-lxc-4 cib[1033743]: error: cluster_connect_cpg: Could not connect to the Cluster Process Group API: 11
Apr 2 11:41:32 juju-machine-4-lxc-4 cib[1033743]: crit: cib_init: Cannot sign in to the cluster... terminating
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: stop_child: Stopping pengine: Sent -15 to process 1033747
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: error: pcmk_child_exit: Child process cib (1033743) exited: Network is down (100)
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: warning: pcmk_child_exit: Pacemaker child process cib no longer wishes to be respawned. Shutting ourselves down.
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: stop_child: Stopping lrmd: Sent -15 to process 1033745
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: stop_child: Stopping stonith-ng: Sent -15 to process 1033744
Apr 2 11:41:34 juju-machine-4-lxc-4 corosync[1033732]: [TOTEM ] A new membership (10.245.160.62:284) was formed. Members joined: 1000
Apr 2 11:41:41 juju-machine-4-lxc-4 stonith-ng[1033744]: error: setup_cib: Could not connect to the CIB service: Transport endpoint is not connected (-107)
Apr 2 11:41:41 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: pcmk_shutdown_worker: Shutdown complete
Apr 2 11:41:41 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: pcmk_shutdown_worker: Attempting to inhibit respawning after fatal error
ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: pacemaker 1.1.10+git20130802-1ubuntu2.3
ProcVersionSignature: User Name 3.16.0-33.44~14.04.1-generic 3.16.7-ckt7
Uname: Linux 3.16.0-33-generic x86_64
NonfreeKernelModules: vhost_net vhost macvtap macvlan xt_conntrack ipt_REJECT ip6table_filter ip6_tables ebtable_nat ebtables veth 8021q garp xt_CHECKSUM mrp iptable_mangle ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack xt_tcpudp iptable_filter ip_tables x_tables nbd ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi openvswitch gre vxlan dm_crypt bridge dm_multipath intel_rapl stp scsi_dh x86_pkg_temp_thermal llc intel_powerclamp coretemp ioatdma kvm_intel ipmi_si joydev sb_edac kvm hpwdt hpilo dca ipmi_msghandler acpi_power_meter edac_core lpc_ich shpchp serio_raw mac_hid xfs libcrc32c btrfs xor raid6_pq hid_generic usbhid hid crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse tg3 ptp pata_acpi hpsa pps_core
ApportVersion: 2.14.1-0ubuntu3.7
Architecture: amd64
Date: Thu Apr 2 11:42:18 2015
SourcePackage: pacemaker
UpgradeStatus: No upgrade log present (probably fresh install)
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1439649/+subscriptions