Sun Cluster 3.2 - Cheat Sheet

This cheatsheet contains common commands and information for both Sun Cluster 3.1 and 3.2, there is some missing information and over time I hope to complete this i.e zones, NAS devices, etc

Also both versions of Cluster have a text based GUI tool, so don't be afraid to use this, especially if the task is a simple one

scsetup (3.1)
clsetup (3.2)

Also all the commands in version 3.1 are available to version 3.2

Daemons and Processes

At the bottom of the installation guide I listed the daemons and processing running after a fresh install, now is the time explain what these processes do, I have managed to obtain informtion on most of them but still looking for others.

Versions 3.1 and 3.2
clexecd	This is used by cluster kernel threads to execute userland commands (such as the run_reserve and dofsck commands). It is also used to run cluster commands remotely (like the cluster shutdown command). This daemon registers with failfastd so that a failfast device driver will panic the kernel if this daemon is killed and not restarted in 30 seconds.
cl_ccrad	This daemon provides access from userland management applications to the CCR. It is automatically restarted if it is stopped.
cl_eventd	The cluster event daemon registers and forwards cluster events (such as nodes entering and leaving the cluster). There is also a protocol whereby user applications can register themselves to receive cluster events. The daemon is automatically respawned if it is killed.
cl_eventlogd	cluster event log daemon logs cluster events into a binary log file. At the time of writing for this course, there is no published interface to this log. It is automatically restarted if it is stopped.
failfastd	This daemon is the failfast proxy server.The failfast daemon allows the kernel to panic if certain essential daemons have failed
rgmd	The resource group management daemon which manages the state of all cluster-unaware applications. A failfast driver panics the kernel if this daemon is killed and not restarted in 30 seconds.
rpc.fed	This is the fork-and-exec daemon, which handles requests from rgmd to spawn methods for specific data services. A failfast driver panics the kernel if this daemon is killed and not restarted in 30 seconds.
rpc.pmfd	This is the process monitoring facility. It is used as a general mechanism to initiate restarts and failure action scripts for some cluster framework daemons (in Solaris 9 OS), and for most application daemons and application fault monitors (in Solaris 9 and10 OS). A failfast driver panics the kernel if this daemon is stopped and not restarted in 30 seconds.
pnmd	Public managment network service daemon manages network status information received from the local IPMP daemon running on each node and facilitates application failovers caused by complete public network failures on nodes. It is automatically restarted if it is stopped.
scdpmd	Disk path monitoring daemon monitors the status of disk paths, so that they can be reported in the output of the cldev status command. It is automatically restarted if it is stopped. Multi-threaded DPM daemon runs on each node. It is automatically started by an rc script when a node boots. It monitors the availibility of logical path that is visiable through various multipath drivers (MPxIO), HDLM, Powerpath, etc. Automatically restarted by rpc.pmfd if it dies.
Version 3.2 only
qd_userd	This daemon serves as a proxy whenever any quorum device activity requires execution of some command in userland i.e a NAS quorum device
cl_execd
ifconfig_proxy_serverd
rtreg_proxy_serverd
cl_pnmd	is a daemon for the public network management (PMN) module. It is started at boot time and starts the PMN service. It keeps track of the local host's IPMP state and facilities inter-node failover for all IPMP groups.
scprivipd	This daemon provisions IP addresses on the clprivnet0 interface, on behalf of zones.
sc_zonesd	This daemon monitors the state of Solaris 10 non-global zones so that applications designed to failover between zones can react appropriately to zone booting failure
cznetd	It is used for reconfiguring and plumbing the private IP address in a local zone after virtual cluster is created, also see the cznetd.xml file.
rpc.fed	This is the "fork and exec" daemin which handles requests from rgmd to spawn methods for specfic data services. Failfast will hose the box if this is killed and not restarted in 30 seconds
scqdmd	The quorum server daemon, this possibly use to be called "scqsd"
pnm mod serverd

File locations

Both Versions (3.1 and 3.2)
man pages	/usr/cluster/man
log files	/var/cluster/logs /var/adm/messages
Configuration files (CCR, eventlog, etc)	/etc/cluster/
Cluster and other commands	/usr/cluser/lib/sc
Version 3.1 Only
sccheck logs	/var/cluster/sccheck/report.<date>
Cluster infrastructure file	/etc/cluster/ccr/infrastructure
Version 3.2 Only
sccheck logs	/var/cluster/logs/cluster_check/remote.<date>
Cluster infrastructure file	/etc/cluster/ccr/global/infrastructure
Command Log	/var/cluster/logs/commandlog

SCSI Reservations

Display reservation keys

scsi2:
/usr/cluster/lib/sc/pgre -c pgre_inkeys -d /dev/did/rdsk/d4s2

scsi3:
/usr/cluster/lib/sc/scsi -c inkeys -d /dev/did/rdsk/d4s2

determine the device owner

scsi2:
/usr/cluster/lib/sc/pgre -c pgre_inresv -d /dev/did/rdsk/d4s2

scsi3:
/usr/cluster/lib/sc/scsi -c inresv -d /dev/did/rdsk/d4s2

Command shortcuts

In version 3.2 there are number of shortcut command names which I have detailed below, I have left the full command name in the rest of the document so it is obvious what we are performing, all the commands are located in /usr/cluster/bin

	shortcut
cldevice	cldev
cldevicegroup	cldg
clinterconnect	clintr
clnasdevice	clnas
clquorum	clq
clresource	clrs
clresourcegroup	clrg
clreslogicalhostname	clrslh
clresourcetype	clrt
clressharedaddress	clrssa

Shutting down and Booting a Cluster

	3.1	3.2
shutdown entire cluster (All Nodes will be brought down to init 0)	##other nodes in cluster scswitch -S -h <host> shutdown -i5 -g0 -y ## Last remaining node scshutdown -g0 -y	cluster shutdown -g0 -y
shutdown single node	scswitch -S -h <host> shutdown -i5 -g0 -y	clnode evacuate <node> shutdown -i5 -g0 -y
reboot a node into non-cluster mode	ok> boot -x	ok> boot -x

Cluster information

	3.1	3.2
Cluster	scstat -pv	cluster list -v cluster show cluster status
Nodes	scstat –n	clnode list -v clnode show clnode status
Devices	scstat –D	cldevice list cldevice show cldevice status
Quorum	scstat –q	clquorum list -v clquorum show clqorum status
Transport info	scstat –W	clinterconnect show clinterconnect status
Resources	scstat –g	clresource list -v clresource show clresource status
Resource Groups	scstat -g scrgadm -pv	clresourcegroup list -v clresourcegroup show clresourcegroup status
Resource Types		clresourcetype list -v clresourcetype list-props -v clresourcetype show
IP Networking Multipathing	scstat –i	clnode status -m
Installation info (prints packages and version)	scinstall –pv	clnode show-rev -v

Cluster Configuration

	3.1	3.2
Release		cat /etc/cluster/release
Integrity check	sccheck	cluster check -v
Configure the cluster (add nodes, add data services, etc)	scinstall	scinstall
Cluster configuration utility (quorum, data sevices, resource groups, etc)	scsetup	clsetup
Rename		cluster rename -c <cluster_name>
Set a property		cluster set -p <name>=<value>
List		## List cluster commands cluster list-cmds ## Display the name of the cluster cluster list ## List the checks cluster list-checks ## Detailed configuration cluster show -t global
Status		cluster status
Reset the cluster private network settings		cluster restore-netprops <cluster_name>
Place the cluster into install mode		cluster set -p installmode=enabled
Add a node	scconf –a –T node=<host><host>	clnode add -c <clustername> -n <nodename> -e endpoint1,endpoint2 -e endpoing3,endpoint4
Remove a node	scconf –r –T node=<host><host>	clnode remove
Prevent new nodes from entering	scconf –a –T node=.
Put a node into maintenance state	scconf -c -q node=<node>,maintstate Note: use the scstat -q command to verify that the node is in maintenance mode, the vote count should be zero for that node.
Get a node out of maintenance state	scconf -c -q node=<node>,reset Note: use the scstat -q command to verify that the node is in maintenance mode, the vote count should be one for that node.

Node Configuration

	3.1	3.2
Add a node to the cluster		clnode add [-c <cluster>] [-n <sponsornode>] \ -e <endpoint> \ -e <endpoint> <node>
Remove a node from the cluster		## Make sure you are on the node you wish to remove clnode remove
Evacuate a node from the cluster	scswitch -S -h <node>	clnode evacuate <node>
Cleanup the cluster configuration (used after removing nodes)		clnode clear <node>
List nodes		## Standard list clnode list [+\|<node>] ## Destailed list clnode show [+\|<node>]
Change a nodes property		clnode set -p <name>=<value> [+\|<node>]
Status of nodes		clnode status [+\|<node>]

Admin Quorum Device

Quorum devices are nodes and disk devices, so the total quorum will be all nodes and devices added together. You can use the scsetup(3.1)/clsetup(3.2) interface to add/remove quorum devices or use the below commands.

	3.1	3.2
Adding a SCSI device to the quorum	scconf –a –q globaldev=d11 Note: if you get the error message "uable to scrub device" use scgdevs to add device to the global device namespace.	clquorum add [-t <type>] [-p <name>=<value>] [+\|<devicename>]
Adding a NAS device to the quorum	n/a	clquorum add -t netapp_nas -p filer=<nasdevice>,lun_id=<IDnumdevice nasdevice>
Adding a Quorum Server	n/a	clquorum add -t quorumserver -p qshost<IPaddress>,port=<portnumber> <quorumservername>
Removing a device to the quorum	scconf –r –q globaldev=d11	clquorum remove [-t <type>] [+\|<devicename>]
Remove the last quorum device	## Evacuate all nodes ## Put cluster into maint mode scconf –c –q installmode ## Remove the quorum device scconf –r –q globaldev=d11 ## Check the quorum devices scstat –q	## Place the cluster in install mode cluster set -p installmode=enabled ## Remove the quorum device clquorum remove <device> ## Verify the device has been removed clquorum list -v
List		## Standard list clquorum list -v [-t <type>] [-n <node>] [+\|<devicename>] ## Detailed list clquorum show [-t <type>] [-n <node>] [+\|<devicename>] ## Status clquorum status [-t <type>] [-n <node>] [+\|<devicename>]
Resetting quorum info	scconf –c –q reset Note: this will bring all offline quorum devices online	clquorum reset
Bring a quorum device into maintenance mode (3.2 known as enabled)	## Obtain the device number scdidadm –L scconf –c –q globaldev=<device>,maintstate	clquorum enable [-t <type>] [+\|<devicename>]
Bring a quorum device out of maintenance mode (3.2 known as disabled)	scconf –c –q globaldev=<device><device>,reset	clquorum disable [-t <type>] [+\|<devicename>]

Device Configuration

	3.1	3.2
Check device		cldevice check [-n <node>] [+]
Remove all devices from node		cldevice clear [-n <node>]
Monitoring		## Turn on monitoring cldevice monitor [-n <node>] [+\|<device>] ## Turn off monitoring cldevice unmonitor [-n <node>] [+\|<device>]
Rename		cldevice rename -d <destination_device_name>
Replicate		cldevice replicate [-S <source-node>] -D <destination-node> [+]
Set properties of a device		cldevice set -p default_fencing={global\|pathcount\|scsi3} [-n <node>] <device>
Status		## Standard display cldevice status [-s <state>] [-n <node>] [+\|<device>] ## Display failed disk paths cldevice status -s fail
Lists all the configured devices including paths across all nodes.	scdidadm –L	## Standard List cldevice list [-n <node>] [+\|<device>] ## Detailed list cldevice show [-n <node>] [+\|<device>]
List all the configured devices including paths on node only.	scdidadm –l	see above
Reconfigure the device database, creating new instances numbers if required.	scdidadm –r	cldevice populate cldevice refresh [-n <node>] [+]
Perform the repair procedure for a particular path (use then when a disk gets replaced)	scdidadm –R <c0t0d0s0> - device scdidadm –R 2 - device id	cldevice repair [-n <node>] [+\|<device>]

Disks group

	3.1	3.2
Create a device group	n/a	cldevicegroup create -t vxvm -n <node-list> -p failback=true <devgrp>
Remove a device group	n/a	cldevicegroup delete <devgrp>
Adding	scconf -a -D type=vxvm,name=appdg,nodelist=<host>:<host>,preferenced=true	cldevicegroup add-device -d <device> <devgrp>
Removing	scconf –r –D name=<disk group>	cldevicegroup remove-device -d <device> <devgrp>
Set a property		cldevicegroup set [-p <name>=<value>] [+\|<devgrp>]
List	scstat	## Standard list cldevicegroup list [-n <node>] [-t <type>] [+\|<devgrp>] ## Detailed configuration report cldevicegroup show [-n <node>] [-t <type>] [+\|<devgrp>]
status	scstat	cldevicegroup status [-n <node>] [-t <type>] [+\|<devgrp>]
adding single node	scconf -a -D type=vxvm,name=appdg,nodelist=<host>	cldevicegroup add-node [-n <node>] [-t <type>] [+\|<devgrp>]
Removing single node	scconf –r –D name=<disk group>,nodelist=<host>	cldevicegroup remove-node [-n <node>] [-t <type>] [+\|<devgrp>]
Switch	scswitch –z –D <disk group> -h <host>	cldevicegroup switch -n <nodename> <devgrp>
Put into maintenance mode	scswitch –m –D <disk group>	n/a
take out of maintenance mode	scswitch -z -D <disk group> -h <host>	n/a
onlining a disk group	scswitch -z -D <disk group> -h <host>	cldevicegroup online <devgrp>
offlining a disk group	scswitch -F -D <disk group>	cldevicegroup offline <devgrp>
Resync a disk group	scconf -c -D name=appdg,sync	cldevicegroup syn [-t <type>] [+\|<devgrp>]

Transport Cable

	3.1	3.2
Add		clinterconnect add <endpoint>,<endpoint>
Remove		clinterconnect remove <endpoint>,<endpoint>
Enable	scconf –c –m endpoint=<host>:qfe1,state=enabled	clinterconnect enable [-n <node>] [+\|<endpoint>,<endpoint>]
Disable	scconf –c –m endpoint=<host>:qfe1,state=disabled Note: it gets deleted	clinterconnect disable [-n <node>] [+\|<endpoint>,<endpoint>]
List	scstat	## Standard and detailed list clinterconnect show [-n <node>][+\|<endpoint>,<endpoint>]
Status	scstat	clinterconnect status [-n <node>][+\|<endpoint>,<endpoint>]

Resource Groups

	3.1	3.2
Adding (failover)	scrgadm -a -g <res_group> -h <host>,<host>	clresourcegroup create <res_group>
Adding (scalable)		clresourcegroup create -S <res_group>
Adding a node to a resource group		clresourcegroup add-node -n <node> <res_group>
Removing	scrgadm –r –g <group>	## Remove a resource group clresourcegroup delete <res_group> ## Remove a resource group and all its resources clresourcegroup delete -F <res_group>
Removing a node from a resource group		clresourcegroup remove-node -n <node> <res_group>
changing properties	scrgadm -c -g <resource group> -y <propety=value>	clresourcegroup set -p Failback=true + <name=value>
Status	scstat -g	clresourcegroup status [-n <node>][-r <resource][-s <state>][-t <resourcetype>][+\|<res_group>]
Listing	scstat –g	clresourcegroup list [-n <node>][-r <resource][-s <state>][-t <resourcetype>][+\|<res_group>]
Detailed List	scrgadm –pv –g <res_group>	clresourcegroup show [-n <node>][-r <resource][-s <state>][-t <resourcetype>][+\|<res_group>]
Display mode type (failover or scalable)	scrgadm -pv -g <res_group> \| grep 'Res Group mode'
Offlining	scswitch –F –g <res_group>	## All resource groups clresourcegroup offline + ## Individual group clresourcegroup offline [-n <node>] <res_group> clresourcegroup evacuate [+\|-n <node>]
Onlining	scswitch -Z -g <res_group>	## All resource groups clresourcegroup online + ## Individual groups clresourcegroup online [-n <node>] <res_group>
Evacuate all resource groups from a node (used when shutting down a node)		clresourcegroup evacuate [+\|-n <node>]
Unmanaging	scswitch –u –g <res_group> Note: (all resources in group must be disabled)	clresourcegroup unmanage <res_group>
Managing	scswitch –o –g <res_group>	clresourcegroup manage <res_group>
Switching	scswitch –z –g <res_group> –h <host>	clresourcegroup switch -n <node> <res_group>
Suspend	n/a	clresourcegroup suspend [+\|<res_group>]
Resume	n/a	clresourcegroup resume [+\|<res_group>]
Remaster (move the resource group/s to their preferred node)	n/a	clresourcegroup remaster [+\|<res_group>]
Restart a resource group (bring offline then online)	n/a	clresourcegroup restart [-n <node>] [+\|<res_group>]

Resources

	3.1	3.2
Adding failover network resource	scrgadm –a –L –g <res_group> -l <logicalhost>	clreslogicalhostname create -g <res_group> <lh-resource>
Adding shared network resource	scrgadm –a –S –g <res_group> -l <logicalhost>	clressharedaddress create -t -g <res_group> <sa-resource>
adding a failover apache application and attaching the network resource	scrgadm –a –j apache_res -g <res_group> \ -t SUNW.apache -y Network_resources_used = <logicalhost> -y Scalable=False –y Port_list = 80/tcp \ -x Bin_dir = /usr/apache/bin
adding a shared apache application and attaching the network resource	scrgadm –a –j apache_res -g <res_group> \ -t SUNW.apache -y Network_resources_used = <logicalhost> -y Scalable=True –y Port_list = 80/tcp \ -x Bin_dir = /usr/apache/bin
Create a HAStoragePlus failover resource	scrgadm -a -g rg_oracle -j hasp_data01 -t SUNW.HAStoragePlus \ > -x FileSystemMountPoints=/oracle/data01 \ > -x Affinityon=true	clresource create -t HAStorage -g <res_group> \ -p FilesystemMountPoints=<mount-point-list> \ -p Affinityon=true <rs-hasp>
Removing	scrgadm –r –j res-ip Note: must disable the resource first	clresource delete [-g <res_group>][-t <resourcetype>][+\|<resource>]
changing or adding properties	scrgadm -c -j <resource> -y <property=value>	## Changing clresource set -t <type> -p <name>=<value> + ## Adding clresource set -p <name>+=<value> <resource>
List	scstat -g	clresource list [-g <res_group>][-t <resourcetype>][+\|<resource>] ## List properties clresource list-props [-g <res_group>][-t <resourcetype>][+\|<resource>]
Detailed List	scrgadm –pv –j res-ip scrgadm –pvv –j res-ip	clresurce show [-n <node>] [-g <res_group>][-t <resourcetype>][+\|<resource>]
Status	scstat -g	clresource status [-s <state>][-n <node>] [-g <res_group>][-t <resourcetype>][+\|<resource>]
Disable resoure monitor	scrgadm –n –M –j res-ip	clresource monitor [-n <node>] [-g <res_group>][-t <resourcetype>][+\|<resource>]
Enable resource monitor	scrgadm –e –M –j res-ip	clresource unmonitor [-n <node>] [-g <res_group>][-t <resourcetype>][+\|<resource>]
Disabling	scswitch –n –j res-ip	clresource disable <resource>
Enabling	scswitch –e –j res-ip	clresource enable <resource>
Clearing a failed resource	scswitch –c –h<host>,<host> -j <resource> -f STOP_FAILED	clresource clear -f STOP_FAILED <resource>
Find the network of a resource	scrgadm –pvv –j <resource> \| grep –I network
Removing a resource and resource group	## offline the group scswitch –F –g rgroup-1 ## remove the resource scrgadm –r –j res-ip ## remove the resource group scrgadm –r –g rgroup-1	## offline the group clresourcegroup offline <res_group> ## remove the resource clresource [-g <res_group>][-t <resourcetype>][+\|<resource>] ## remove the resource group clresourcegroup delete <res_group>

Resource Types

	3.1	3.2
Adding (register in 3.2)	scrgadm –a –t <resource type> i.e SUNW.HAStoragePlus	clresourcetype register <type>
Register a resource type to a node	n/a	clresourcetype add-node -n <node> <type>
Deleting (remove in 3.2)	scrgadm –r –t <resource type>	clresourcetype unregister <type>
Deregistering a resource type from a node	n/a	clresourcetype remove-node -n <node> <type>
Listing	scrgadm –pv \| grep ‘Res Type name’	clresourcetype list [<type>]
Listing resource type properties		clresourcetype list-props [<type>]
Show resource types		clresourcetype show [<type>]
Set properties of a resource type		clresourcetype set [-p <name>=<value>] <type>