Windows Server 2016 – Disk and Share quorum members (Failover Clustering)


To totally unlock this section you need to Log-in


Login

Let's begin by saying and focus that computer clusters (Windows-based) require 3 or more voting member systems to be stable.

A failover cluster is a group of independent servers which work collaboratively to ensure high availability and scalability for workloads such as File Server, Exchange, SQL, and Virtual Machines.

The nodes (the servers) in the cluster are connected by physical cables and by software. If one of the cluster nodes fail, the services on the node failovered to another node in the cluster to provide service continuously.

For this failover capability, services on the cluster are proactively monitored to verify their are working properly.

If they are not, they are moved to another node or restarted.

In failover clustering, quorum concept is designed to prevent issues caused by "split-brain" in a cluster. As nodes in a cluster are communicating each other via network to achieve failover clustering functionality, when some problem occurs in the network, the nodes in the cluster become unable to communicate.

In that "split" situation, if both side of the separated cluster think itself as the brain of the cluster and try to write to the same disk, it can cause some critical problems. To avoid this issue when network disconnection occurs, either side of the separated cluster must stop running as a cluster.

A 2-node cluster can become unstable in certain circumstances without a third node or a disk or network share quorum member. This separate disk or network share acts as an independent cluster node for cluster quorum operations.

Windows Server 2016 - Disk and Share quorum members (Failover Clustering)

The majority of vote is 2 votes. So a two nodes cluster as above is not really resilient because if you lose a node, the cluster is down.

Each cluster member will update records on a disk or share quorum folder which allows other quorum members to know the availability and health of each of the other members.

On clusters with an even number of cluster nodes are vulnerable to a condition known as "Split Brain" where half of the members lose contact with the other half. The best solution for this is to create a disk or share quorum member.

A quickly overview of disk and file share quorum solutions is the following:

  • A quorum disk must have a partition of at least 512 MB.
  • A network share must have at least 5 MB space available.
  • When the cluster quorum is created, a directory will be created and populated with data.

To create the quorum, run the following cluster commands in Powershell:

Create the quorum configuration

Set-ClusterQuorum –Cluster <cluster name> -NodeMajority

To create a disk quorum witness

Set-ClusterQuorum -NodeAndDiskMajority "Cluster Disk 2"

Note: the disk name under the Failover Cluster Manager should likely be changed to a more descriptive name other than Cluster Disk x. This can be done easily inside the Failover Cluster Manager.

To create a network share quorum witness

Set-ClusterQuorum -NodeAndFileShareMajority “\\fileserver\fsw“

Note: this network share must be on a system other than the clustered system. A network share cannot reside on the same cluster as the file system is managed by the cluster and must be available when the cluster is down or booting.

Failover Cluster Quorum Witness

As said before, it is recommended to have an odd majority of votes. But sometimes we don’t want an odd number of nodes. In this case, a disk witness, a file witness or a cloud witness can be added to the cluster.

This witness too has a vote. So when there are an even number of nodes, the witness enables to have an odd majority of vote. Below, the requirements and recommendations of each Witness type (except Cloud Witness):

Disk witness

This is a dedicated LUN that stores a copy of the cluster database and it is most useful for clusters with shared (not replicated) storage.

  • Size of LUN must be at least 512 MB.
  • Must be dedicated to cluster use and not assigned to a clustered role.
  • Must be included in clustered storage and pass storage validation tests.
  • Cannot be a disk that is a Cluster Shared Volume (CSV).
  • Basic disk with a single volume.
  • Does not need to have a drive letter.
  • Can be formatted with NTFS or ReFS.
  • Can be optionally configured with hardware RAID for fault tolerance.
  • Should be excluded from backups and antivirus scanning.

File share witness

This is an SMB file share that is configured on a file server running Windows Server.

  • Does not store a copy of the cluster database.
  • Maintains cluster information only in a witness.log file.
  • Most useful for multisite clusters with replicated storage.
  • Must have a minimum of 5 MB of free space.
  • Must be dedicated to the single cluster and not used to store user or application data.
  • Must have write permissions enabled for the computer object for the cluster name.

The following are additional considerations for a file server that hosts the file share witness:

  • A single file server can be configured with file share witnesses for multiple clusters.
  • The file server must be on a site that is separate from the cluster workload. This allows equal opportunity for any cluster site to survive if site-to-site network communication is lost. If the file server is on the same site, that site becomes the primary site, and it is the only site that can reach the file share.
  • The file server can run on a virtual machine if the virtual machine is not hosted on the same cluster that uses the file share witness.
  • For high availability, the file server can be configured on a separate failover cluster.

So below you can find again a two-nodes cluster with a witness (disk or file share):

Windows Server 2016 - Disk and Share quorum members (Failover Clustering)

Now there is a witness, you can lose a node and keep the quorum.

Even if a node is down, the cluster still working. So when you have an even number of nodes, the quorum witness is required. But to keep an odd majority of votes, when you have an odd number of nodes, you should not implement a quorum witness.

Quorum Configuration Types

Below you can find the four possible cluster configuration:

Node Majority (Recommended for Clusters with an odd number of nodes)

Windows Server 2016 - Disk and Share quorum members (Failover Clustering)

  • Can sustain failures of half the nodes (rounding up) minus one. For example, a seven node cluster can sustain three node failures.

Node and Disk Majority (recommended for clusters with an even number of nodes).

Windows Server 2016 - Disk and Share quorum members (Failover Clustering)

On the other hand, in case the disk witness is offline, the cluster sustain failures of half the nodes minus one in the cluster.

Windows Server 2016 - Disk and Share quorum members (Failover Clustering)

In addition, when using this quorum mode, the disk witness also contains a replica of the cluster configuration database.

Windows Server 2016 - Disk and Share quorum members (Failover Clustering)

  • Can sustain failures of half the nodes (rounding up) if the disk witness remains online. For example, a six node cluster in which the disk witness is online could sustain three node failures.
  • Can sustain failures of half the nodes (rounding up) minus one if the disk witness goes offline or fails. For example, a six node cluster with a failed disk witness could sustain two (3-1=2) node failures.

Node and File Share Majority (Clusters with special configurations)

In this quorum mode, running nodes and a file share witness (like a disk witness) are counted as quorum votes.

Windows Server 2016 - Disk and Share quorum members (Failover Clustering)

Note that the file share witness does not have a backup. But instead, it contains information about which node has the latest version of the cluster configuration database.

  • Works in a similar way to Node and Disk Majority, but instead of a disk witness, this cluster uses a file share witness.
  • Note that if you use Node and File Share Majority, at least one of the available cluster nodes must contain a current copy of the cluster configuration before you can start the cluster. Otherwise, you must force the starting of the cluster through a particular node.

A file share witness is recommended for a geographically distributed cluster, for example, in case you are considering disaster recovery capability.

For a file share witness, you need to have an additional site (in the figure below: Site C) other than the 2 sites where your cluster nodes are hosted and your services are running (in the figure below: Site A and Site B).

In Site A and Site B the cluster nodes are running, and in Site C the file share witness is hosted. By doing so, even if Site A goes down, Site B would still be up. (Note that the file share witness must be accessible by all nodes in both sites).

Windows Server 2016 - Disk and Share quorum members (Failover Clustering)

However, maintaining the highly available File Server (Site C) to host the file share witness is expensive, so it is possible to put the file share witness in Site A if you want (but not recommended).

In this case, Site A is considered as a primary site. You need to be sure that once Site A failed or is disconnected from the network, the cluster stop running and will not failover because the majority of the cluster nodes in a failed state.

No Majority: Disk Only (not recommended)

Can sustain failures of all nodes except one (if the disk is online). However, this configuration is not recommended because the disk might be a single point of failure.

Windows Server 2016 - Disk and Share quorum members (Failover Clustering)

This mode is a legacy configuration and not recommended. This is not majority election at all. The number of nodes are not counted as quorum votes. Only the disk is the quorum. As long as one of the node communicates with the disk, it is considered as a cluster. In the other word, you can run a cluster with a node and a disk. But the disk is a single point of failure and if the the disk fails the cluster becomes unavailable as well.

Cloud Quorum Witness (Windows Server 2016 feature)

Cloud Witness is a new quorum model of failover clustering for a quorum witness that leverages Microsoft Azure public cloud by using Azure Blob Storage to read/write a blob file.

A cloud witness works in the same as a file share witness which does not contain a copy of the cluster database. In short, the additional site for the file share is on Azure public cloud. Therefore, this quorum witness model can be also used for multi-site clusters.

In addition, comparing to a file share witness, a cloud witness is very cost effective. Very small data written per blob file and blob file is updated only once when cluster nodes' state changes. And also you can cut the maintenance fee for an additional date center site.

As a result, it is recommended to configure a cloud witness for a multi-site cluster if all nodes in the cluster can reach the internet. This is also the reason why still File Share Witness model is necessary in some cases.

There are many companies whose data centers are out of the reach of the internet access because of some security reasons. In that case, they can not utilize Azure public cloud for a cloud witness. To implement multi-site clustering capabilities such as disaster recovery for those data centers, File Share Witness (recommended on DFS servers or on dedicated failover cluster) model can be used.

By implementing a Cloud Quorum Witness, you avoid to spend money on a third room in case of stretched cluster. Below this is the scenario:

Windows Server 2016 - Disk and Share quorum members (Failover Clustering)

The Cloud Witness, hosted in Microsoft Azure, has also one vote (used for node majority). For that you need an existing storage account in Microsoft Azure. You need also an access key.

Windows Server 2016 - Disk and Share quorum members (Failover Clustering)

Now you have just to configure the quorum as a standard witness. Select Configure a Cloud Witness when it is asked on the Failover Custer Manager on Windows Server 2016.

Windows Server 2016 - Disk and Share quorum members (Failover Clustering)

Then specify the Azure Storage Account and a storage key.

Windows Server 2016 - Disk and Share quorum members (Failover Clustering)

At the end of the configuration process (few seconds), the Cloud Witness should be configured and online.

Windows Server 2016 - Disk and Share quorum members (Failover Clustering)

Stretched Cluster Scenario

Finally, there is a note regarding a common and wrong scenario on how to manage Witness Quorum. On some configuration we could see stretched cluster between two datacenters. Below the wrong stretched scenario:

Windows Server 2016 - Disk and Share quorum members (Failover Clustering)

In this wrong scenario we can see that it seems to follow the recommendation for node majority, so four nodes with a witness to obtain an odd majority of votes.

So let’s start the production. The cluster is running for a while and then one day the room 1 is underwater. So you lose Room 1:

Windows Server 2016 - Disk and Share quorum members (Failover Clustering)

In this scenario you should have also a stretched storage and so if you have implemented a disk witness it should move to room 2.

In the above case you have lost the majority of votes and so the cluster stop working (sometimes with some luck, the cluster is still working because the disk witness has time to failover but it is lucky).

So when you implement a stretched cluster, take note of the following scenario:

Windows Server 2016 - Disk and Share quorum members (Failover Clustering)

In this last scenario, even if you lose a room, the cluster will still work. Take note that stretched cluster configuration is not properly recommended. To address the witness quorum location, in Windows Server 2016, it can be hosted in Microsoft Azure (Cloud Witness).

Dynamic Quorum

Dynamic Quorum does what its name implies, it adjust the quorum dynamically. Let's see we have 7 nodes

So in an example scenario, assuming I didn’t lose all four servers at the same time, as servers in the cluster went offline, the number of votes in the quorum would adjust dynamically.

When node one went offline, I would then in theory have a six node cluster. When node two went offline, I would then have a five node cluster, and so on. In reality, if I continued to lose cluster nodes one by one, I could go all the way down to a two node cluster and still remain online.

And, if I had configured a witness (Disk or File Share) I could actually go all the way down to a single node and still remain online.