Clustering and Distributed Resource Scheduler (DRS)
Before we start looking at DRS and HA we need to create a VM cluster, the cluster will provides services (DRS, HA) between a number of ESXi servers (nodes). To create a cluster we right-click on the Production Datacenter
We are taken to the cluster configuration screen, here we configure which services we want the cluster to use such as DRS, HA and vSAN (which I will cover in another section), you don't have to make up your mind straight away as services can be added later
Once the cluster has been create there are two additional steps to take, first we need to add the ESXi servers and then configure the clusters networking.
Firstly we need to add the ESXi servers, as I have already added them to the vCenter then I simply select the existing servers, but you can add new ESXi servers as well
The next screen is a ESXi server summary screen, click finish and these will be added to the cluster
When finished you are returned to the summary screen, click finish to add the ESXi servers to the cluster
you can see in the recent tasks at the bottom that the ESXi servers where added, now we move on to configuring the cluster networking (vMotion), click configure in the configure cluster
Here we are setting up a distributed switch which will be configured both ESXi servers, a port group will also be created and the VMKernel NIC's with the vMotion service selected.
We enter the IP addresses of the VMKernel NIC's, subnet and gateway
In the next screen we can configure some of the advanced options of the cluster, here I have used the defaults, we can change these at a later date and i will cover these in more details later sections.
Then we come to the summary screen, click finish to configure the clusters networking.
Once finished you will see both the ESXI servers in maintenance mode but added to the cluster
Going to the summary screen we can see more details on the cluster, once I start creating some more VM's we should start to see so activity on this screen, however i point you to the warning in the middle of the screen regarding the vSphere DRS functionality being in a unhealthy state, this is due to the cluster not having a qourum and thus could suffer a split brain propblem.
Now we can exit the mainteanace mode of each of the ESXi servers, hopefully any health alerts will then disappear, the cluster is now ready. Its very simple to add additional hosts using the above configure screens.
Distributed Resource Scheduler (DRS)
DRS is automated vMotion, when DRS recognizes a imbalance in the resources used on one ESXi server in a cluster, it rebalances the VM's among those servers. DRS also handles where a VM should be powered on for the first time, its uses VMware HA software to perform this, HA is in charge of detecting any crashes and making sure that the VM is started on another node within the cluster, when the ESXi server is repaired DRS when then rebalances the load again across the whole cluster.
There are a number of new features in DRS
When I first started using DRS I noticed that there was not an even number of VM's on each ESXi server within the cluster, this is not the intention of DRS, different VM's create different amounts of resource demands, its primary goal is to keep a balanced load across each ESXi server thus you may end up having more VM's on one ESXi server if their loads are not very heavy. DRS also won't keep moving VM's in order to keep the cluster perfectly balanced, only if the cluster becomes very unbalanced it will weigh up whether the penalty of vMotion is worth the performance gain. DRS is clever enough to try and separate the large VM's (more CPU and memory) onto different ESXi servers and to move the smaller VM's if the balance is not right. DRS does have a threshold of up to 60 vMotion's events per hour, it will check for imbalances in the cluster once every five minutes. VMware prevents "DRS storms" for example if a ESXi server crashes, DRS starts the VM's on other ESXi servers then see's a imbalance which then causes more vMotion events trying to rebalance the cluster, however this is prevented because DRS will wait at least five minutes before checking the cluster and it would only offer recommendations based on your migration threshold. This allows the administrator to control how aggressively DRS tries to rebalance the cluster.
To see the configuration options of DRS, highlight the cluster and select the Configur tab, then select vSphere DRS, there are four areas that you can configure for DRS
In the DRS automation seection you can change the automation level, migration threshold, predictive DRS and Virtual Machine automation
You can choose from three different levels of automation (see below) and also set a migration threshold (see below), Predictive DRS hooks into Realize Operations manager to use the stats to determine if the cluster is imbalanced and you can also override individual VM's if a VM has specific requirements.
Next we have additional options such as VM distribution, the CPU Over-Commitment and the Scaleable Shares
In the Additional Options you can force a more even distribution across the cluster, this may however cause more of a load as the cluster will always try and balance the cluster, moving VM's more regularly to keep that balance. The CPU over commitment in its simplest term means allocating more resources to virtual workloads then what is available at physical level. Most common resources that are over committed are memory and cpu. This allows for a slight over-commitment before rebalancing otherwise the cluster my rebalance due to a spike in a VM resouces. You can use resource pools to scale the cluster rebalancing yourself, this area can become complex and is rarely used.
There are a couple of options in regards to power managemnt and DRS, this kicks in when you are powering off/on ESXi servers within the cluster, there may be times when you may need for example to patch a ESXi, so you may need to power off/on an ESXi server a few times and you don't want it joining then unjoining a cluster which could cause issues.
You can disable/enable Dynamic Power Management (DPM) within the cluster, you can set the DPM threshold to be conservative or aggressive depending how often you power off/on ESXi servers.
Lastly you have a advanced options section
There are a number of advanced options that you can can use, these are for the more advanced setups or recommended from VMWare themselves.
There are many options for DRS and it a case on what you have in your environment, for small/medium setups the defaults will be fine, for more complex and large environments there are options to get the best out of the cluster.
There are times where at a weekend for example you may want to drop an ESXi server, thus you could schedule a DRS task for a very specific date and time, in the DRS section in the top right-hand corner there is a Schedule DRS Task button
The first setup screen allows you to name the task, give a description and create the schedule and any email notifications
The second setup screen has many of the options we talked above.