Certs: 3.3 Configuring Site Level Fault Tolerance (70-412)

Hyper-V Replica is a Server 2012 & 2012 R2 feature that allows you to create a replica of a virtual machine running on Hyper-V on either of the platforms. If the primary VM fails, you can failover to the replica VM. Thus Hyper-V Replica can provide fault tolerance for a VM even if the entire host site should go offline. 

Unlike a failover cluster, Hyper-V replica does not use shared storage between the VMs, the replica VM instead begins with its own copy of the primary’s VMs VHD. The primary will then send updates of its changes (called replication data), and this data is repeatedly saved by the replica.  

Replication Frequency is every 5 minutes for Hyper-V on 2012, and either 30 seconds, 5 minutes or 15 minutes on 2012 R2 

Configuring Hyper-V Physical Host Servers 

It’s important to understand the sequence of steps in configuring Hyper-V replica 

Configure the server level replication settings for both physical Hyper-V hosts called the primary server and the replica server. Access these settings in Hyper-V manager by right clicking a host server and selecting Hyper-V settings

By default, replication is not enabled and no options are selected or configured.

To enable a physical Hyper-V host as a replica server, tick the “Enable this computer as a replica server” option at the top of the screen. Then configure the Authentication and Ports section, followed by the Authorization & Storage

  • Authentication And Ports – Chose which authentication methods you want available later when configuring the locally hosted VM for replication
    • Use Kerberos (HTTP) – Only available if the local server is domain joined. The advantage of this is that it required no further configuration. The disadvantage is that the data sent is not encrypted, can only be used if the remote server is located in a trusted domain.
    • Must enable Firewall Rule named Hyper-V Replica HTTP Listener (TCP-In)
    • Use Certificate Based Authentication (HTTPS) – Can be enabled regardless of whether the local server is domain joined. Advantages are data is encrypted and allows replication with a host when there is no domain trust relationship in place. The disadvantage is that it’s more difficult to configure and requires an X.509v3 certificate with EKU support for both Client & Server Authentication
    • Must enable Firewall Rule named Hyper-V Replica HTTPS Listener (TCP-In)

It’s important to remember that server 2012 does not automatically enable the firewall rules needed for authentication.

  • Authorisation and Storage – Configure security settings on the local server that are used when the server acts as a replica. Here you specify the remote primary servers from which the local server will accept replication data.
    • Allow Replication from any Authenticated Server – Less secure option, local server can receive replication data from any authenticated server
    • Allow Replication from Specified Servers – Specify the primary server(s) authorized for the local replica server.

Configuring the VMs

The next step is to configure the chosen VM for replication. Begin by right-clicking the VM and select “Enable Replication”.

This then opens the enable replication wizard, which has 5 configuration options

  • Specify The Replica Server – Specify the remote replication server by name
  • Specify Connection Parameters – Specify the authentication method, which was specified at the server level in the previous step.

  • Choose Replication VHDs – By default all VHDs attached to the VM are enabled for replication, you can deselect any you want
  • Configure Additional Recovery – Most likely part to appear on the EXAM. You can configure only the latest recovery point or additional recovery points
    • Recovery Points are snapshots saved on the replica server, Replication traffic sends a new snapshot to the replica every 5 to 15 minutes however only the latest is saved by default. Selecting additional recovery points configures the replica to keep one extra snapshot per hour. If you perform a failover you have the option of selecting the most recent version of the VM (which is always available) or one of the earlier hourly snapshots.

  • Volume Shadow Copy Service Snapshot Frequency (VSS) – These are high-quality snapshots taken when the VM momentarily “quiesces” gracefully pauses activity in VSS aware applications (MS Exchange and SQL server). This will ensure that failover will be error-free in these apps and the pause is so short the users do not normally notice the outage. More processor intensive. These application-consistent snapshots are created in addition to the normal ones

  • Choose Initial Replication Method – Specify how the initial copy of the VHD attached to the primary will be sent over to the replica. By default, they are sent over the network, but you can export to external media and physically transport. Another option is to use an existing VM on the replica as the initial copy, use if you have restored an exact copy of the primary VM on the replica.

Configuring Failover TCP/IP Settings

Specify the TCP/IP settings that will apply to the replica after failover, by default it will inherit the same IP4 & IP6 settings as the primary. However, in many cases, the replica may need a different IP configuration. To configure, use Hyper-V Manager on replica server, right-click the replica VM and select Settings. Expand network adaptor and select failover TCP/IP.

Enter the IP configuration details for the replica server. Then on the primary assign the original IP configuration in the same area otherwise the replica settings will be copied back when the primary is restored.

Resynchronizing the Primary & Replica VMs

A highly resource-intensive operation that is performed occasionally between the primary and replica. Can occur at any time however it is possible to restrict it to off-peak hours, or just do it manually.

Performing Hyper-V Replica Failover

Three types of failovers, planned, unplanned, test failovers

  • Planned FailoverOnly failover initiated from the primary. Used when it’s possible to manually shut down the primary VM. No data is lost in this failover type. With a planned failover only the exact copy of the current primary VM and its VHDs can be failed over.

Start by shutting down the primary VM, right click in Hyper-V manager and select Planned Failover.

The latest updates are sent to the replica, the VM is failed over, and the replica VM automatically started.

  • Unplanned Failover Performed at the replica, when the primary VM fails suddenly and cannot be brought back online. To perform an unplanned failover, in Hyper-V Manager on the replica right-click the replica VM and select failover.

You need to choose a recovery point, then the VM is started on the replica. The relationship with the primary is now broken and replication stops

When you can bring the primary back online you can resume replication by reversing the replication relationship. To reverse replication right-click the VM on the replica and select “Reverse Replication”

  • Test Failover – Only failover that can be performed while the primary VM is still running, the purpose is to confirm all is configured correctly so everything will work correctly in an emergency. To perform right click the replica VM, click replication, and then click test failover.

Specify a recovery point, then a local disposable copy of the replica VM is created on the replica server. A new copy of the VM appears in Hyper-V with the tag “-Test”

Using Hyper-V Replica in a Failover Cluster

Provides an off-site replica VM, and is used to recover from site level failures. The steps to configure a replica VM for a clustered VM differ slightly.

Begin by opening Failover Cluster Manager, and add the failover cluster role “Hyper-V Replica Broker

When the role is installed and configured, you need to set up the server replication settings. Right-click the Replica Broker node in Failover Cluster Manager and select Replication Settings.

On the remote replica server, configure replication as normal, however, if the replica server is a failover cluster you need to configure via Cluster Manager and add the replica broker role.

After configuring the host server settings enable replication by right-clicking the VM in Failover Cluster Manager and selecting Enable Replication.

Remember, to configure replication for a cluster VM using Failover Cluster Manager, you must install the Replica Broker role and configure this role with the replication settings.

Configure Hyper-V Replica Extended Replication

Extended replication is a feature available in Windows Server 2012 R2 that allows you to extend replication beyond the host and replica server to a third site.

Using Global Update Manager

When the state of a cluster changes (when a node goes offline) all of the other nodes in the cluster must acknowledge the change before the cluster will commit the change to the cluster database.

The Global Update Manager is responsible for managing these updates. Server 2012 R2 allows you to configure the Global Update Manager through the

Get-Cluster.DatabaseReadWriteMode PowerShell cmdlet

  • 0 = All (write) and Local (read) – The default setting for all cluster workloads except Hyper-V. It requires that all nodes acknowledge the update before the change is committed to the database.
  • 1 = Majority (read and write) – This is the default setting for Windows Server 2012 R2 Hyper-V failover clusters, required that a majority of nodes acknowledge the update before is committed to the database.
  • 2 = Majority (Write) and Local (Read) – This mode also required the majority of cluster nodes acknowledge the update before committing the change to the database.

Recovering Multi-Site Failover Cluster Failures

In some cases, it may be necessary to force a cluster restart during a multi-site cluster failure. For example, you have a 7 node cluster, with 4 nodes at the first site and 3 at the second. A network outage disrupts inter-site communication.

In this situation, the first site with 4 nodes would remain operational and the 3 nodes at the second site would shut down because they could not achieve Quorum.

However if the network outage only impacts external network communication to the first site, and the second site with the cluster nodes shuts down but remains accessible to clients.

In this situation, you would force the nodes at the second site online by using

Cluster.exe /fq

Everything will now work fine until network connectivity is restored to the first site, where the original 4 nodes believe they have the quorum. This is called Split-Brain.

In Server 2012 clusters when connectivity has been restored you need to manually restart the nodes in the first site using the Cluster.exe /pq command

Server 2012 R2 has improvements, and clusters automatically reconcile when they detect split-brain configurations. The nodes started with the /fq switch are deemed authoritative, and the others will automatically start with the /pq switch without any intervention.

Site level fault tolerance was one of my favorite sections of the 70-412 exam content.

Enjoy

TSP Admin