redpill-linpro team mailing list archive

Thread
Date

Cluster How to - Ubuntu 10.04 LTS

To: redpill-linpro@xxxxxxxxxxxxxxxxxxx
From: Boris Devouge <boris.devouge@xxxxxxxxxxxxx>
Date: Wed, 02 Feb 2011 15:19:37 +0000
Organization: Canonical
Reply-to: boris.devouge@xxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.13) Gecko/20101208 Lightning/1.0b2 Thunderbird/3.1.7

How to attached !

Cheers,
-- 
--
Boris Devouge		<boris.devouge@xxxxxxxxxxxxx>
Sales Engineering	Office: +44 (0)20 7630 2476
Canonical               	Mobile: +44 7809 389 874
GPG FPR: ADB9 0AE9 2451 2BAD B2C7  BB2C DB22 052A 7A37 FC75

2-node cluster on Ubuntu with Pacemaker & Corosync, CLVMD and Apache, IP and GFS2 (including DLM) & Virtual Machine resources
--

[all nodes] 1. Install systems, change hostsnames, configure network devices. (out of scope)

Considerations:
- Use two networks, a public network for users to access the service and a private network for the cluster heartbeat traffic.

- Use bonding for network interfaces (active/standy mode is recommended) to a redundant switch configuration for network resiliance.

- If you are using shared storage via fibre channel, ensure that you have a multipathing technology correctly configured (device-mapper-multipath is recommended)


[all nodes] 2. Add Hostnames to /etc/hosts, so we do not rely on DNS

 e.g.

# Cluster Public Network
192.168.122.21 cluster-node-1.example.com cluster-node-1
192.168.122.22 cluster-node-2.example.com cluster-node-2

# Cluster Heartbeat Network
10.0.0.1 cluster-node-1-priv.example.com cluster-node-1-priv
10.0.0.2 cluster-node-2-priv.example.com cluster-node-2-priv


[all nodes] 3. NTPD
Ensure NTPD is installed and running (out of scope)

 # sudo apt-get install -y ntp

[all nodes] 4. Install the ubuntu-ha PPA (Personal Package Archive) repository:

The easiest way to do this is to use the add-apt-repository command which is provided by python-software-properties.

 # sudo apt-get install -y python-software-properties
 # sudo add-apt-repository ppa:ubuntu-ha/lucid-cluster
 
Update the apt repository package lists
 # sudo apt-get update

All of the cluster components are coming from the ubuntu-ha teams PPA because some of the components have not yet reached the main repository.


[all nodes] 5. Install Pacemaker & gfs2-pacemaker (GFS2 resource):  

 # sudo apt-get install -y pacemaker gfs2-pacemaker 

All of these packages should be installed as dependancies:
cluster-agents cluster-glue corosync fancontrol gawk gfs2-pacemaker gfs2-tools libccs3 libcluster-glue libcorosync4 libcurl3 libdlm3 libdlm3-pacemaker libdlmcontrol3 libesmtp5 libheartbeat2 liblogthread3 libltdl7 libnet1 libnspr4-0d libnss3-1d libopenais3 libopenhpi2 libopenipmi0 libperl5.10 libsensors4 libsnmp-base libsnmp15 libxslt1.1 lm-sensors openhpid pacemaker


[all nodes] 6. Enable corosync

To allow corosync to be enabled you must modify the /etc/default/corosync file otherwise the init script will not work:

# sudo vim /etc/default/corosync

Change:
start=no

To:
start=yes


[all nodes] 8. Configure corosync
 
On all nodes, modify your corosync configuration file /etc/corosync/corosync.conf

 # sudo vim /etc/corosync/corosync.conf 

Replace the following lines:

bindnetaddr: <your heartbeat network> e.g. heartbeat interface is 192.168.122.23 so bindnetaddr would be 192.168.122.0
mcastaddr: 225.94.1.1


[all nodes] 9. Start corosync & pacemaker

 # sudo service corosync start


[all nodes] 10. Check the cluster status

Run the pacemaker crm monitor tool to check out the current cluster status:

 # sudo crm_mon -1

You may have to wait a few minutes for the cluster to form.


[one node] 11. Add cluster options

 - Basic options
 
[optional] During testing you may not have a fencing / STONITH device, so we need to disable STONITH (NOT recommended for production, fencing is critical for cluster operation!)

 # sudo crm configure property stonith-enabled=false
 
 We are using a two node cluster so during testing we need to disable the quorum policy (NOT recommended for production!)

 # sudo crm configure property no-quorum-policy=ignore

 We want resources to stick to their node when failing over (so we don't have a ping pong effect of services going back and forth when a node fails)

 # sudo crm configure rsc_defaults resource-stickiness="100"


[one node] 12. Add cluster resources for DLM
 
 - If we want to use a cluster file system we need the distributed locking manager running on each node

 # sudo crm configure primitive DLM ocf:pacemaker:controld op monitor interval="120s"
 # sudo crm configure clone cloneDLM DLM meta globally-unique="false" target-role="Started" interleave="true"


[all nodes] 11. Clustered Logical Volume Manager Daemon (CLVMD)

In the clvm packages provided in Main with Lucid are not compiled with support for corosync, only cman (the cluster manager from Red Hat Cluster Suite).
Updates on this are being tracked here: https://bugs.launchpad.net/ubuntu/+source/lvm2/+bug/525287

A modified version of clvmd is provided in the ubuntu-ha PPA compiled with --with-clvm=cman,corosync

 - Install package

 # sudo apt-get install -y clvm

 - Change LVM locking type 

 # sudo vim /etc/lvm/lvm.conf
 # Change 'locking_type = 1' to 'locking_type = 3'

 - Ensure clvm does not try to start on boot

 # sudo update-rc.d clvm disable 

 - Reboot cluster nodes - as lvm2 has been replaced we need to reboot to ensure all changes are in place

 - Ensure cluster nodes are correctly rejoined to the cluster

 # sudo crm_mon -1

 - Add the clvmd resource to pacemaker

 # sudo crm configure primitive CLVM ocf:lvm2:clvmd params daemon_timeout="30"
 # sudo crm configure clone cloneCLVM CLVM meta globally-unique="false" target-role="Started" interleave="true" 

 - Check that all services are running correctly

 # sudo crm_mon -1

---------------
 e.g.
 trellis@cluster-node-1:~$ sudo crm_mon -1
 ============
 Last updated: Tue May  4 17:07:53 2010
 Stack: openais
 Current DC: cluster-node-1 - partition with quorum
 Version: 1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd
 2 Nodes configured, 2 expected votes
 2 Resources configured.
 ============

 Online: [ cluster-node-1 cluster-node-2 ]

  Clone Set: cloneDLM
      Started: [ cluster-node-1 cluster-node-2 ]
  Clone Set: cloneCLVM
      Started: [ cluster-node-1 cluster-node-2 ]
---------------

[one node] 12. Configure fencing / STONITH

Fencing or STONITH is critical for production clusters
 WARNING: Without STONITH data corruption can occur.

You can find out which STONITH devices are supported by running the following command:
 
 # sudo stonith -L

The best STONITH methods are external devices not reliant on the host operating system (for example, there could be a kernel panic), these are devices such as APC power switches and remote management cards (IBM RSA, HP iLO, Dell DRAC - although these rely on the server having power).

After choosing a stonith method, more information on the parameters you will need can be found through the following commands:

- Parameters required
 # sudo stonith -t <method> -n

- Help page
 # sudo stonith -t <method> -h

In my example, I do not have any external fencing devices so I will use the 'ssh' method, this is not recommended for production as it is not external to the operating system. In this example I also separate the ssh STONITH devices into two distinct resources just like you would do for any other external STOITH configuration, as you may have multiple devices with different username/passwords

- Configure first ssh fence agent:
 # sudo crm configure primitive stonith-ssh-1 stonith:external/ssh params hostlist="cluster-node-1"

- Configure second ssh fence agent:
 # sudo crm configure primitive stonith-ssh-2 stonith:external/ssh params hostlist="cluster-node-2"

Configure location policy, stonith-ssh-1 should run on node1:
 # sudo crm configure location l-stonith-node1 stonith-ssh-1 -inf: node1 

Configure location policy, stonith-ssh-2 should run on node2:
 # sudo crm configure location l-stonith-node2 stonith-ssh-2 -inf: node2

After configuration of the STONITH devices your crm_mon output should look similar to the output below:

---------------
trellis@cluster-node-1:~$ sudo crm_mon -1
============
Last updated: Tue May  4 17:30:40 2010
Stack: openais
Current DC: cluster-node-1 - partition with quorum
Version: 1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd
2 Nodes configured, 2 expected votes
4 Resources configured.
============

Online: [ cluster-node-1 cluster-node-2 ]

 Clone Set: cloneDLM
     Started: [ cluster-node-1 cluster-node-2 ]
 Clone Set: cloneCLVM
     Started: [ cluster-node-1 cluster-node-2 ]
 stonith-ssh-1	(stonith:external/ssh):	Started cluster-node-2
 stonith-ssh-2	(stonith:external/ssh):	Started cluster-node-1
---------------

13. Add IP and Website resources

In this example we add a Apache2 resource and an IP address for users to access our website:

- [all nodes] Install apache2
 # sudo apt-get install -y apache2 

- [all nodes] We want to control apache2's startup via the cluster, remove the update-rc.d links for it to start on boot
 # sudo update-rc.d apache2 disable

- [one node] Add a resource for apache, I will call mine 'Website'
 # sudo crm configure primitive Website ocf:heartbeat:apache params configfile="/etc/apache2/apache2.conf" op monitor interval="1min"

- [one node] Add a IP resource for our website, I will call mine 'WebsiteIP' with IP 192.168.122.252
 # sudo crm configure primitive WebsiteIP ocf:heartbeat:IPaddr2 params ip="192.168.122.252" cidr_netmask="32" op monitor interval="30s"

- [one node] So that they both stay on the same node, we need to add a colocation rule
 # sudo crm configure colocation website-with-ip inf: Website WebsiteIP


14. Add a Clustered LVM volume (CLVM) & GFS file system

To add a cluster file system to the cluster, we first must have some form of shared storage providing directly attached disks as block devices, these could be via a storage area network over iSCSI, Fibre Channel or other transport methods. DRBD could also be uses to provide networked RAID1 drives, essentially giving us a replicated drive we can use as shared storage.

Recommendations: If you are using a disk or LUN from a SAN, ensure you are using a method of multipathing. For Fibre Channel, device-mapper-multipath is highly recommended.

In this example all nodes see a single shared disk, which is directly attached as a block device, no multipathing is in use as this is within a virtual environment. I will also use clustered lvm ontop of the disk for added flexibility.


14a.) Configuring the LVM volume 

[one node] Create a LVM physical volume:

 # sudo pvcreate /dev/vdb

[one node] Create a LVM volume group:

 # vgcreate vgShared /dev/vdb

[one node] Create a LVM logical volume:

 # sudo lvcreate -n lvGFS -l 249 /dev/vgShared

[one node] Tell all cluster nodes to refresh their cache

 # sudo clvmd -R

[all nodes] From each node, test pvdisplay, vgdisplay and lvdisplay to make sure you can see the shared volume

 # sudo pvdisplay

 # sudo vgdisplay

 The output from vgdisplay should show you a "clustered" flag set to "yes"

 # sudo lvdisplay


14b.) Creating the GFS file system

[one node] Create the GFS2 file system

  For each node you must have a journal, this is specified by the -j option, so if you have three nodes you will need to use -j3

 # sudo mkfs.gfs2 -p lock_dlm -j2 -t pcmk:pcmk /dev/mapper/vgShared-lvGFS


14c.) Creating a GFS resource

Before we can mount the GFS2 file system we first must having a running GFD controld process, this will be part of our cluster resource which we need to make sure is running on every node.

[onenode] - Add a resource for the GFS control daemon

 # sudo crm configure primitive GFSD ocf:pacemaker:controld params daemon="gfs_controld.pcmk" args="" op monitor interval="120s"

 - Create the clone, so we can run it on more than one node

[onenode] # sudo crm configure clone cloneGFSD GFSD meta globally-unique="false" interleave="true" target-role="Started"

 - Ensure GFS and the DLM (distributed locking manager) colocate together

[onenode] # sudo crm configure colocation colGFSDDLM inf: cloneGFSD cloneDLM

 - Ensure GFS is loaded after DLM

[onenode] # sudo crm configure order ordDLMGFSD 0: cloneDLM cloneGFSD

 - Create a directory where the cluster file system should be mounted
 
[allnodes] # sudo mkdir /data

 - Ensure GFS does not start on boot (it is fully controlled by pacemaker)

[allnodes] # sudo update-rc.d gfs2-tools disable

 - Add the file system as a resource in Pacemaker

[onenode] # sudo crm configure primitive FS ocf:heartbeat:Filesystem params device="/dev/mapper/vgShared-lvGFS" directory="/data" fstype="gfs2" \ 
    op monitor interval="120s" meta target-role="Started"

 - Clone the file system resource so it can be started on all nodes at the same time
 
[onenode] # sudo crm configure clone cloneFS FS meta interleave="true" ordered="true" target-role="Started"

 - Ensure GFS and the mounted FS resource colocate together

[onenode] # sudo crm configure colocation colFSGFSD inf: cloneFS cloneGFSD

 - Configure start order, so that cloneFS is only loaded after cloneGFSD

[onenode] # order ordGFSDFS 0: cloneGFSD cloneFS


###### This next section is currently untested & incomplete ######

[onenode] 15. Add a Virtual Domain resource
 
- Configure & install kvm & libvirt and install a domain (guest) and setup & test live migration  - out of scope of this howto

Once you have a guest installed onto a shared storage device and live migration configured, you can start to add your vm as a resource.

 # sudo crm configure primitive Website ocf:heartbeat:VirtualDomain params config="/path/to/configuration/file" \
    hypervisor="" migration_transport="" op monitor interval="1min"


16. Testing 

- Fencing
- Moving services
- Failing nodes
- Multipath Failure
- Network Failure
- CLVM

Reference Material:
https://wiki.ubuntu.com/ClusterStack/LucidTesting
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html-single/Pacemaker_Explained/
http://www.novell.com/documentation/sle_ha/