touch-packages team mailing list archive
-
touch-packages team
-
Mailing list archive
-
Message #100339
[Bug 1439649] Re: Pacemaker unable to communicate with corosync on restart under lxc
Looking at logs from bug 1491228, it would appear that the first time
that pacemaker goes to talk to the corosync daemon it gets denied. Per
upstream docs [0], if the enable-acl property isn't explicitly enabled,
then any user in the haclient group should have access. Since the
hacluster charm doesn't explicitly enable the acl, I'd expect pacemaker
to be running under the haclient/hacluster uid/gid.
The charms don't enable the acl and the package creates the hacluster
user in the haclient group, so I suspect there's something additional
going on here. For completeness, I'll attach the syslog in here. The
pacemaker node did start after following JuanJo's workaround in comment
#15.
[0] - http://clusterlabs.org/doc/acls.html
--
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1439649
Title:
Pacemaker unable to communicate with corosync on restart under lxc
Status in lxc package in Ubuntu:
Confirmed
Status in pacemaker package in Ubuntu:
Confirmed
Bug description:
We've seen this a few times with three node clusters, all running in
LXC containers; pacemaker fails to restart correctly as it can't
communicate with corosync, resulting in a down cluster. Rebooting the
containers resolves the issue, so suspect some sort of bad state
either in corosync or pacemaker.
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: mcp_read_config: Configured corosync to accept connections from group 115: Library error (2)
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: main: Starting Pacemaker 1.1.10 (Build: 42f2063): generated-manpages agent-manpages ncurses libqb-logging libqb-ipc lha-fencing upstart nagios heartbeat corosync-native snmp libesmtp
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: cluster_connect_quorum: Quorum acquired
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: corosync_node_name: Unable to get node name for nodeid 1000
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: corosync_node_name: Unable to get node name for nodeid 1001
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: corosync_node_name: Unable to get node name for nodeid 1003
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: corosync_node_name: Unable to get node name for nodeid 1001
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: crm_update_peer_state: pcmk_quorum_notification: Node juju-machine-4-lxc-4[1001] - state is now member (was (null))
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: corosync_node_name: Unable to get node name for nodeid 1003
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: crm_update_peer_state: pcmk_quorum_notification: Node (null)[1003] - state is now member (was (null))
Apr 2 11:41:32 juju-machine-4-lxc-4 crmd[1033748]: notice: main: CRM Git Version: 42f2063
Apr 2 11:41:32 juju-machine-4-lxc-4 stonith-ng[1033744]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
Apr 2 11:41:32 juju-machine-4-lxc-4 stonith-ng[1033744]: notice: corosync_node_name: Unable to get node name for nodeid 1001
Apr 2 11:41:32 juju-machine-4-lxc-4 stonith-ng[1033744]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
Apr 2 11:41:32 juju-machine-4-lxc-4 attrd[1033746]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
Apr 2 11:41:32 juju-machine-4-lxc-4 corosync[1033732]: [MAIN ] Denied connection attempt from 109:115
Apr 2 11:41:32 juju-machine-4-lxc-4 corosync[1033732]: [QB ] Invalid IPC credentials (1033732-1033746).
Apr 2 11:41:32 juju-machine-4-lxc-4 attrd[1033746]: error: cluster_connect_cpg: Could not connect to the Cluster Process Group API: 11
Apr 2 11:41:32 juju-machine-4-lxc-4 attrd[1033746]: error: main: HA Signon failed
Apr 2 11:41:32 juju-machine-4-lxc-4 attrd[1033746]: error: main: Aborting startup
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: error: pcmk_child_exit: Child process attrd (1033746) exited: Network is down (100)
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: warning: pcmk_child_exit: Pacemaker child process attrd no longer wishes to be respawned. Shutting ourselves down.
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: pcmk_shutdown_worker: Shuting down Pacemaker
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: stop_child: Stopping crmd: Sent -15 to process 1033748
Apr 2 11:41:32 juju-machine-4-lxc-4 crmd[1033748]: warning: do_cib_control: Couldn't complete CIB registration 1 times... pause and retry
Apr 2 11:41:32 juju-machine-4-lxc-4 crmd[1033748]: notice: crm_shutdown: Requesting shutdown, upper limit is 1200000ms
Apr 2 11:41:32 juju-machine-4-lxc-4 crmd[1033748]: warning: do_log: FSA: Input I_SHUTDOWN from crm_shutdown() received in state S_STARTING
Apr 2 11:41:32 juju-machine-4-lxc-4 crmd[1033748]: notice: do_state_transition: State transition S_STARTING -> S_STOPPING [ input=I_SHUTDOWN cause=C_SHUTDOWN origin=crm_shutdown ]
Apr 2 11:41:32 juju-machine-4-lxc-4 cib[1033743]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
Apr 2 11:41:32 juju-machine-4-lxc-4 crmd[1033748]: notice: terminate_cs_connection: Disconnecting from Corosync
Apr 2 11:41:32 juju-machine-4-lxc-4 corosync[1033732]: [MAIN ] Denied connection attempt from 109:115
Apr 2 11:41:32 juju-machine-4-lxc-4 corosync[1033732]: [QB ] Invalid IPC credentials (1033732-1033743).
Apr 2 11:41:32 juju-machine-4-lxc-4 cib[1033743]: error: cluster_connect_cpg: Could not connect to the Cluster Process Group API: 11
Apr 2 11:41:32 juju-machine-4-lxc-4 cib[1033743]: crit: cib_init: Cannot sign in to the cluster... terminating
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: stop_child: Stopping pengine: Sent -15 to process 1033747
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: error: pcmk_child_exit: Child process cib (1033743) exited: Network is down (100)
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: warning: pcmk_child_exit: Pacemaker child process cib no longer wishes to be respawned. Shutting ourselves down.
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: stop_child: Stopping lrmd: Sent -15 to process 1033745
Apr 2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: stop_child: Stopping stonith-ng: Sent -15 to process 1033744
Apr 2 11:41:34 juju-machine-4-lxc-4 corosync[1033732]: [TOTEM ] A new membership (10.245.160.62:284) was formed. Members joined: 1000
Apr 2 11:41:41 juju-machine-4-lxc-4 stonith-ng[1033744]: error: setup_cib: Could not connect to the CIB service: Transport endpoint is not connected (-107)
Apr 2 11:41:41 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: pcmk_shutdown_worker: Shutdown complete
Apr 2 11:41:41 juju-machine-4-lxc-4 pacemakerd[1033741]: notice: pcmk_shutdown_worker: Attempting to inhibit respawning after fatal error
ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: pacemaker 1.1.10+git20130802-1ubuntu2.3
ProcVersionSignature: User Name 3.16.0-33.44~14.04.1-generic 3.16.7-ckt7
Uname: Linux 3.16.0-33-generic x86_64
NonfreeKernelModules: vhost_net vhost macvtap macvlan xt_conntrack ipt_REJECT ip6table_filter ip6_tables ebtable_nat ebtables veth 8021q garp xt_CHECKSUM mrp iptable_mangle ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack xt_tcpudp iptable_filter ip_tables x_tables nbd ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi openvswitch gre vxlan dm_crypt bridge dm_multipath intel_rapl stp scsi_dh x86_pkg_temp_thermal llc intel_powerclamp coretemp ioatdma kvm_intel ipmi_si joydev sb_edac kvm hpwdt hpilo dca ipmi_msghandler acpi_power_meter edac_core lpc_ich shpchp serio_raw mac_hid xfs libcrc32c btrfs xor raid6_pq hid_generic usbhid hid crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse tg3 ptp pata_acpi hpsa pps_core
ApportVersion: 2.14.1-0ubuntu3.7
Architecture: amd64
Date: Thu Apr 2 11:42:18 2015
SourcePackage: pacemaker
UpgradeStatus: No upgrade log present (probably fresh install)
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1439649/+subscriptions