Replication to S3 is introduced in VAST Cluster 3.2.0.
Replication to S3 is a low cost method of backing up the data on your VAST Cluster. The replication feature enables you to replicate a VAST Cluster's data to an AWS S3 bucket or to any custom destination that supports S3 access. You can restore the replicated data from a VAST Cluster to the same VAST Cluster or to a different VAST Cluster.
The introductory version of the replication feature replicates the entire file system.
To set up replication, first create a replication target. A replication target defines a destination to which data can be replicated. Then create a replication policy. A replication policy specifies a target, a starting time and a frequency for performing replication. Once you have a target and a policy, create a replication stream. The replication stream takes snapshots of the data at the configured points in time and performs replication to the target following each snapshot.
You can have multiple replication targets defined in parallel on a VAST Cluster.
Creating a replication target on a VAST Cluster enables you to replicate the data from the VAST Cluster to the target. It also exposes all data that was replicated to that target to client mounts of the VAST Cluster's file system.
A replication policy is a reusable configuration that defines the date and time to start replicating and a frequency for performing replications. So, for example, you could set replication to be done on July 1st, 2020 at midnight and then once every day. Replication would be done every day at midnight beginning July 1st.
Replication streams copy data from a VAST Cluster to a replication target. You can have one replication stream defined and active on a VAST Cluster at any given time.
Immediately after you create a replication stream, the stream starts copying all data under the root directory to the replication target and continues copying until an initial state of sync between the VAST Cluster and the target is reached. Following initial sync, the stream performs replication at the times configured in the replication policy. Due to the immediate initial sync, the start time in the policy is in fact the second time replication is done.
The stream performs replication by taking a snapshot of the cluster's data at a point in time and then copying data to the target until replication is complete. Only the changes between the data at the time of the snapshot and the data previously copied to the target is copied. For each point in time, the stream creates a restore point from which the data can be accessed.
Restore points are time stamped directories on the target that let you access data as it was at the point in time when the data was replicated. Each active replication stream creates a restore point every time it performs replication. The name of each restore point is the time of the creation of a snapshot for replication.