Red Hat Multipathing
This is a short document describing the Device Mapper Multipath (DM-Multipath) which is a feature of Red Hat Linux REL 5. DM-Multipath allows you to configure multiple I/O paths from your server to your storage device (SAN, etc). These are physical paths that include HBA, cables and switches. Multipathing aggregates the I/O paths, creating a new device that consists of the aggregated paths.
DM-Multipathing can provide the following
Redundancy | DM-Multipathing can provide failover capabilities, this can be configured in two ways
|
Improved performance | DM-Multipathing can detect loading on the I/O paths and dynamically re-balance the load. |
DM-Multipathing by default has support for most of the common SAN arrays, all devices supported are stored in a file multipath.conf.defaults. You can add support into the multipath configuration file /etc/multipath.conf but more on this alter.
The following are the components that make up the multipathing
Kernel components |
|
dm-multipath kernel module | reroutes I/O and supports failover for paths and path groups |
multipathd daemon | Monitors paths and takes action when necessary (failed paths, restored paths) |
Multipath Configuration Files |
|
multipath.conf | The main configuration file, this is located in /etc. |
multipath.conf.defaults | Lists support storage arrays, if your array is not listed it still may be possible to configure it in the multipath.conf file . The location of the file is /usr/share/device-mapper-multipath-???? depending on the version you have. |
bindings | This file is located in /var/lib/multipath and can be used to define user-friendly names and make devices consistent across nodes. |
Multipath commands |
|
multipath | List and configures multipath devices |
kpartx | Creates device mapper devices for partitions on a device |
Each multipath device has a World Wide Identifier (WWID), which is guaranteed to be unique and unchanging (I will discuss DR later which means that the WWID will change). By default the name of multipath device is set to its WWID but there is a option called user_friendly_names which sets the alias to a node-unique name of the form of mpathn.
For example a server has two additional SAN LUN's connected to the server the outcome may be as follows
device name | WWID | dm-multipath names |
/dev/sdb | 20017380006c00009 | /dev/mapper/mpath1 /dev/mpath/mpath1 /dev/dm-1 |
/dev/sdc | ||
/dev/sdd | 20017380006c0000a | /dev/mapper/mpath2 /dev/mpath/mpath2 /dev/dm-2 |
/dev/sde |
/dev/sdb and /dev/sdc are in fact the same disk (same WWID), thus multipath will create one unique name mpath1, if either of the paths fail the server will be unaffected.
As you can see multipath creates three different ways to access the device
/dev/mapper/mpathn | These are create early in the boot sequence, thus these are ideal for logical volumes, boot devices |
/dev/mpathn | are provided as a convenience so that all multipathed devices can be seen in one directory. These devices are created by the udev device manager and may not be available during startup. |
/dev/dm-n | these are for external use only, however when I installed a fresh Red Hat these were used? |
You also have a fourth option and that is to set an alias in the multipath.conf file. A multipath device has a number of attributes which can be changed in the multipath.conf file, we will see some examples of this later.
To setup DM-Multipathing the procedure is as follows
A basic multipath.conf file is below,
multipath.conf (basic) | defaults { user_friendly_names yes path_group_policy failover } blacklist { # Make sure our multipath devices are enabled. blacklist_exceptions { multipaths { |
Once you have configured your multipath.conf, then perform the below to start multipathd
Hopefully you should now a list similar to below, each device is active and ready.
multipath -ll (simple) | ## This example shows my connections to a IBM XIV SAN, yours may look different server1> multipath -ll |grep mpath mpath2 (20017380006c00036) dm-7 IBM,2810XIV mpath1 (20017380006c00035) dm-6 IBM,2810XIV mpath0 (20017380006c00034) dm-5 IBM,2810XIV mpath3 (20017380006c00037) dm-8 IBM,2810XIV |
multipath -ll (detailed) | ## This example shows my connections to a IBM XIV SAN, yours may look different mpath2 (20017380006c00036) dm-7 IBM,2810XIV mpath1 (20017380006c00035) dm-6 IBM,2810XIV mpath0 (20017380006c00034) dm-5 IBM,2810XIV mpath3 (20017380006c00037) dm-8 IBM,2810XIV |
If you have made a mistake in the multipath.conf file use the below to correct it
On occasion you may come across where the array you have is not in the multipath.conf.defaults file, you can add a devices section, you may need to consult the manufactures documentation on what you may need in the multipath.conf file, below is an example of a HP OPEN-V series array.
non-default array support | ## add to the multipath.conf file devices { device { vendor "HP" product "OPEN-V." getuid_callout "/sbin/scsi_id -g -u -p0x80 -s /block/%n" } } Note: make sure you consult the release notes regarding your array |
The configuration file is divided into the following sections
You can blacklist any device you may wish but you need to tell multipath what to exclude, some examples would be the following
wwid | ## specific wwid blacklist { wwid "20017380006c00034" } ## All wwid ## specific wwid blacklist { wwid "*" } |
device name | ## all sd devices a through z blacklist { devnode "^sd[a-z]" } ## A more advanced example blacklist { devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*" devnode "^(hd|xvd|vd)[a-z]*" } |
device type | ## blacklist HP devices blacklist { device { vendor "HP" product "*" } } |
To exclude from the blacklist you create an exception list (blacklist_exception), its the same syntax as above
wwid | ## exclude a specific wwid blacklist_exceptions { wwid "20017380006c00034" } ## exclude all wwid ## specific wwid blacklist_exceptions { wwid "*" } |
device name | ## all sd devices x through z blacklist_exceptions { devnode "^sd[x-z]" } |
device type | ## exclude HP devices blacklist_exceptions { device { vendor "HP" product "*" } } |
The default section has a number of parameters which can be changed
Parameter | Default Value |
Description |
udev_dir | /udev |
Specifies the directory where udev device nodes are created. |
verbosity | 2 |
(RHEL 5.3 and later) Specifies the verbosity level of the command. It can be overridden by the -v command line option. |
polling_interval | 5 |
Specifies the interval between two path checks in seconds. |
selector | round-robin 0 |
Specifies the default algorithm to use in determining what path to use for the next I/O operation. |
path_grouping_policy | failover |
Specifies the default path grouping policy to apply to unspecified multipaths. Possible values include: failover = 1 path per priority group multibus = all valid paths in 1 priority group group_by_serial = 1 priority group per detected serial number group_by_prio = 1 priority group per path priority value group_by_node_name = 1 priority group per target node name |
getuid_callout | /sbin/scsi_id -g -u -s |
Specifies the default program and arguments to call out to obtain a unique path identifier. An absolute path is required. |
prio_callout | Specifies the the default program and arguments to call out to obtain a path weight. Weights are summed for each path group to determine the next path group to use in case of failue. "none" is a valid value. | |
path_checker | readsector0 |
Specifies the default method used to determine the state of the paths. Possible values include: readsector0, rdac, tur, cciss_tur, hp_tur (RHEL 5.5 and later), emc_clariion, hp_sw, and directio. |
features | The extra features of multipath devices. The only existing feature is queue_if_no_path, which is the same as setting no_path_retry to queue. | |
rr_min_io | 1000 |
Specifies the number of I/O requests to route to a path before switching to the next path in the current path group. |
max_fds | (RHEL 5.2 and later) Sets the maximum number of open file descriptors for the multipathd process. In RHEL 5.3, this option allows a value of max, which sets the number of open file descriptors to the system maximum. | |
rr_weight | uniform |
If set to priorities, then instead of sending rr_min_io requests to a path before calling selector to choose the next path, the number of requests to send is determined by rr_min_io times the path's priority, as determined by the prio_callout program. Currently, there are priority callouts only for devices that use the group_by_prio path grouping policy, which means that all the paths in a path group will always have the same priority. If set to uniform, all path weights are equal. |
failback | manual |
Specifies path group failback. A value of 0 or immediate specifies that as soon as there is a path group with a higher priority than the current path group the system switches to that path group. A numeric value greater than zero specifies deferred failback, expressed in seconds. A value of manual specifies that failback can happen only with operator intervention. |
no_path_retry | null |
A numeric value for this attribute specifies the number of times the system should attempt to use a failed path before disabling queueing. A value of fail indicates immediate failure, without queuing. A value of queue indicates that queuing should not stop until the path is fixed. |
flush_on_last_del | no |
(RHEL 5.3 and later) If set to yes, the multipathd daemon will disable queueing when the last path to a device has been deleted. |
queue_without_daemon | yes |
(RHEL 5.3 and later) If set to no, the multipathd daemon will disable queueing for all devices when it is shut down. |
user_friendly_names | no |
If set to yes, specifies that the system should using the bindings file to assign a persistent and unique alias to the multipath, in the form of mpathn. The default location of the bindings file is /var/lib/multipath/bindings, but this can be changed with the bindings_file option. If set to no, specifies that the system should use use the WWID as the alias for the multipath. In either case, what is specified here will be overriden by any device-specific aliases you specify in the multipaths section of the configuration file. |
bindings_file | /var/lib/multipath/bindings |
(RHEL 5.2 and later) The location of the bindings file that is used with the user_friend_names option. |
mode | The default value is determined by the process. |
(RHEL 5.3 and later) The mode to use for the multipath device nodes, in octal. |
uid | The default value is determined by the process. |
(RHEL 5.3 and later) The user ID to use for the multipath device nodes. You must use the numeric user ID. |
gid | The default value is determined by the process. |
(RHEL 5.3 and later) The group ID to use for the multipath device nodes. You must use the numeric group ID. |
checker_timeout | The default value is taken from sys/block/sdx/device/timeout. |
(RHEL 5.5 and later) The timeout value to use for path checkers that issue SCSI commands with an explicit timeout, in seconds. |
The multipaths section parameters are as follows
Parameter | Description |
wwid | Specifies the WWID of the multipath device to which the multipath attributes apply. |
alias | Specifies the symbolic name for the multipath device to which the multipath attributes apply. |
path_group_policy prio_callout path_selector failback rr_weight no_path_retry flush_on_last_del rr_min_io mode uid_gid |
All the same as the defaults table |
The devices section parameters are as follows
Parameter | Description |
vendor | Specifies the vendor name of the storage device to which the device attributes apply, for example COMPAQ. |
product | Specifies the product name of the storage device to which the device attributes apply, for example HSV110 (C)COMPAQ. |
path_checker | Specifies the default method used to determine the state of the paths. Possible values include readsector0, rdac, tur, cciss_tur, hp_tur, emc_clariion, hp_sw, and directio. |
features | The extra features of multipath devices. The only existing feature is queue_if_no_path, which is the same as setting no_path_retry to queue. |
hardware_handler | Specifies a module that will be used to perform hardware specific actions when switching path groups or handling I/O errors. Possible values include 0, 1 emc, and 1 rdac. The default value is 0. |
product_backlist | Specifies a regular expression used to blacklist devices by product. |
path_group_policy getuid_callout prio_callout path_selector failback rr_weight no_path_retry flush_on_last_del rr_min_io |
All the same as defaults table |
Administration and Troubleshooting
Once you have multipathing configured and running there are a number of administering tasks that you may need to perform or use
To resize a online multipath device you must follow the procedure below
Resize online multpath LUN |
||
1 | Resize the physical device | ## use whatever method your SAN uses to resize the physical device |
2 | find the paths of the devices | multipath -l |
3 | Resize the paths | ## For SCSI devices writing a 1 (one) to the rescan file for the device causes the SCSI driver to rescan echo 1 > /sys/block/<device_name>/device/rescan |
4 | Resize the multipath device | multipathd -k'resize map mpath0' |
5 | resize the filesystem | ## use the method for the appropriate filesystem type # I am assuming that no LVM or DOS partition is being used resize2fs /dev/mapper/mpath0 |
When a large number of LUNs are being used multipathed devices can increase the time it takes for the udev device manager to create devices nodes for them. You can correct this problem by "deleting/commenting out" the below line from the /etc/udev/rules.d/40-multipath.rules file. This line causes the udev device manager to run multipath every time a block device is added to the node. Even with this line removed, the multipathd daemon will still automatically create multipathed devices, and multipath will still be called during the boot process for nodes with multipathed root file systems. The only change is that multipathed devices will not be automatically created when the multipathd daemon is not running, which should not be a problem for the vast majority of multipath users.
/etc/udev/40-multipath_rules | KERNEL!="dm-[0-9]*", ACTION=="add", PROGRAM=="/bin/bash -c '/sbin/lsmod | /bin/grep ^dm_multipath'", RUN+="/sbin/multipath -v0 %M:%m" |
If features "1 queue_if_no_path" is specified in the /etc/multipath.conf file, then any process that issues I/O will hang until one or more paths are restored. To avoid this, set the no_path_retry N parameter in the /etc/multipath.conf file (where N is the number of times the system should retry a path).
If you need to use the features "1 queue_if_no_path" option and you experience the issue noted here, use the dmsetup command to edit the policy at runtime for a particular LUN (that is, for which all the paths are unavailable). For example, if you want to change the policy on the multipath device mpath2 from "queue_if_no_path" to "fail_if_no_path", execute the following command.
change policy on multipath device | dmsetup message mpath2 0 "fail_if_no_path" Note: you must specify the mpathn alias rather than the path |
The multipath command outputs data in the format as below
multipath device | action_if_any: alias (wwid_if_different_from_alias) [size][features][hardware_handler] |
path group | \_ scheduling_policy [path_group_priority_if_known] [path_group_status_if_known] |
path | \_ host:channel:id:lun devnode major:minor [path_status] [dm_status_if_known] |
The multipath command has a number of options, which vary the detail of the output and can remove devices
multipath topology gathered from information in sysfs and the device mapper | multipath -l |
displays as above plus in addition to all other available components | multipath -ll |
no output | multipath -v0 |
outputs the created or updated multipath names only | multipath -v1 |
prints all detected paths, multipath and maps | multipath -v2 |
Remove the named device from multipath | multipath -f <device> |
Remove all multipath devices | multipath -F |
dry run, don't actually change anything | multipath -d |
force maps to specified policy | multipath -p <policy> Note: policies are failover multibus group_by_serial group_by_prio group_by_node_name |
You can also obtain details of the multipathn devices by using the dmsetup command
dmsetup | ## List major, minor numbers dmsetup ls |
The last command you can use is the multipathd command
multipathd | ## Start interactive session ## Here is the help page CLI commands reference: |
I had a configuration where a server would be required to connect to two SAN's (one LIVE and one DR), the DR SAN was a mirror copy of the LIVE SAN. Both sets of SAN LUNs had different WWIDs, thus the configuration had to be changed when DR was invoked.
Disk | LIVE SAN wwid | DR SAN wwid |
root | 20017380006be0008 | 20017380006c00003 |
swap | 20017380006be0009 | 20017380006c00004 |
The problem I had was when we cutover to the DR SAN the wwid did not match the existing configuration, and thus the multipathing was not working correctly, in order to get it working correctly three files must be changed
Firstly backup all the above files to a safe place as they will be needed when you cut back over to the LIVE environment.
/etc/multipath.conf | Change the /etc/multipath.conf to reflect the new DR wwid numbers |
/var/lib/bindings | Then edit the /var/lib/bindings file to reflect the new DR wwid numbers, if you used aliasing in the multipath.conf then this file may not exist. |
/boot/initrd file | The initrd (initialize ram disk) also has both the above files buried into it, thus we must unzip and un-cpio the image file make the changes and then re-cpio and zip it back again, the steps are below
When the server reboots the new wwid should be in place and the server should use the correct multipathing for the DR SAN. |
When you cut back to the LIVE server just copy the original files back and reboot the server.