Failover capabilities are available when a protected path is active between two peers. Failover is initiated from the destination peer in a native replication peer configuration. When failover is done, the destination peer changes role and the replicated data is writeable. Client applications can connect to the cluster that had previously performed the destination role in replication and continue operating with that cluster instead of using the primary cluster.
Note
Failover refers to the change in the replication relationship between the peers and the backed up data becoming writeable. The switching of client applications to the backup cluster is not seamless, since the cluster is a separate cluster. It requires client mounting to the backup cluster and similar VMS configurations to those which were in place on the primary cluster. For details, see Deploying a Failed Over Replication Peer as a Working Cluster.
There are two types of failover processes, which are used for both failing over and failing back depending on the scenario:
Graceful failover ensures no data is lost. It requires that the peers can communicate with each other. In a graceful failover, data is completely synced between the peers, the data on the source peer becomes read-only, and the replicated data on the destination peer becomes writeable.
After failover, replication resumes in the reverse direction. Clients can resume operating by connecting to views on the backup cluster.
During the graceful failover process, there is a period in which both peers are read-only.
A graceful failover proceeds as follows:
-
A user initiates graceful failover from the destination peer's VMS.
-
The source peer becomes read-only, while the destination peer temporarily remains read-only as well.
-
If replication was already in progress before the failover was initiated, it is completed.
-
Data is synced between the source and destination peers, by transferring any delta between the last snapshot taken on the source and the data on the source at the time of initiating failover.
-
The now synced replica of the data on the destination peer becomes writeable.
-
Replication continues as before, except that the direction is now reversed with a role reversal. The destination peer becomes the source peer and the source peer becomes the destination peer.
The result of graceful failover is:
-
The protected path on the former source peer is now read-only, while the replicated path on the former destination peer is now writeable.
-
Replication is enabled in the reverse direction relative to the pre-failover configuration. The former source peer is now the destination peer and vice versa.
Note
A protection policy that specifies a destination replication peer is automatically mirrored on that replication peer when it is created. In case of failover, the mirrored protection policy is used to continue replicating in the reverse direction.
Ungraceful failover is an option that is available even if there is no communication between the peers. In a non graceful failover:
-
The replicated path on the former destination peer becomes writeable. If the primary cluster is still operating, it also remains writeable.
-
Replication is suspended. The peers change roles from source and destination to a third role called standalone, which reflects that the paths on both are writeable and that there is currently no replication between them.
An ungraceful failover proceeds as follows:
-
A user initiates graceful failover from the destination peer's VMS.
-
Replication between the peers is suspended.
-
The replicated path on the destination peer becomes writeable.
-
The role of the former destination peer becomes standalone.
The result of an ungraceful failover is:
-
The protected path on both peers is writable.
-
Replication between the peers is suspended.
-
Data that was not yet replicated is lost, assuming you are using the If you are resuming operations using the protected path on the former destination peer, any data that was not replicated from the former source peer to the former destination peer is lost on the former destination peer.
If the primary cluster is still in communication, you can perform a graceful failover.
In order to minimize the effective "downtime" in which the path on both peers is read-only, proceed as follows:
-
If replication is in progress, wait until it completes before starting failover.
-
If replication did not take place for a while before you want to do the failover, force a replication by altering the schedule in the protection policy so that a replication will take place in the near future.
-
If replication takes time such that there is likely a significant delta to be synced to the destination peer, force a second replication to take place immediately after replication completes. This will be the quickest possible replication and enable you to start failover with the smallest delta.
If the primary cluster loses communication or is destroyed, it is possible to fail over to the destination peer with an ungraceful failover. In this case, operations effectively roll back to the last snapshot that was transferred to the destination peer. Any data that was written to the source peer since the point in time at which that last snapshot was captured are lost.
Failing back to the primary cluster can be effected in accordance with the scenario:
In this scenario, you can fail back by performing a graceful failover from the primary cluster, which has become the destination peer. This is a reversal of the original failover process. Data will be synced between the clusters and the original configuration will be restored. During the failover, both clusters are read-only. The length of the downtime depends on the size of the delta between the clusters at the time of failover. You can minimize this downtime by performing it as soon as possible after a restore point is created. This might mean modifying the replication schedule in advance of the failover.
Following an ungraceful failover, if the primary cluster comes back up and you want to fail back to it, you can do the following:
-
Resume replication from the backup cluster that you had failed over to. The backup cluster becomes the source peer and primary cluster becomes the destination peer.
-
After a restore point is completed on the destination peer, perform a graceful failover from the destination peer. This will recover all of the latest written data from the backup cluster and enable you to continue working with the primary cluster.
Comments
0 comments
Article is closed for comments.