Similarity-based data reduction, which complements VAST Cluster's data compression and deduplication mechanisms, is an optional feature that can save a lot of storage space when a lot of data is similar yet not identical. Similarity detects data blocks that are similar yet not identical and uses that similarity to store newly written data such that only the change between the older similar blocks are stored. The similar blocks are linked together in clusters. The reduction ratio may be further improved at a later time through re-clustering, in which similarity linkages between blocks are broken to allow new linkages between blocks of higher similarity.
Similarity-based data reduction is enabled by default on newly installed clusters, although it is possible to install a cluster with similarity disabled. You can disable similarity at any time.
Enabling similarity causes data that is written afterwards to be written with similarity-based data reduction. When you enable the feature on a running cluster, you are also offered the option to rewrite data that was not written with similarity. This option will ensure that similarity is applied optimally to all data on the cluster.
All data on the cluster that was written without similarity enabled is typically rewritten during this rewrite and therefore the impact on storage media endurance is approximately similar to that of deleting all that amount of data on the cluster and writing it.
The rewrite proceeds as a background task that cannot be paused or stopped. In case of severe performance degradation, it may be possible for VAST Support to throttle the process and reduce the performance impact.
The rewrite may take a while, and may impact performance for workloads.
DBox expansion is not available while the rewrite is in progress.
It is not recommended to disable Similarity during the rewrite, since the rewrite process cannot be paused or stopped. Disabling Similarity during rewrite causes the process to run indefinitely although it will stop applying similarity to the data.
In the VAST Web UI, open the Cluster tab of the Settings page. You can reach this by searching at the top left or from navigation menu on the left of the page.
In the Features section, slide the Enable similarity slider to the on position.
Click Save to save your change and then Yes confirm your changes.
After confirming that you want to save your changes, you are prompted to choose if you want to run a rewrite on the cluster's current data:
You have enabled Similarity. Similarity does not affect data that was written before it is enabled. Do you want to run a rewrite of current data? Rewrite may impact workloads while it is in progress. Stopping rewrite requires support intervention. DBox expansion will not be available during rewrite. Are you sure you want to proceed?
Answer Yes to proceed with the rewrite or No to skip the rewrite.
If you chose to proceed with the rewrite, the rewrite begins and a progress bar appears at the top right of the page, reporting the current phase of the rewrite as it progresses and the percentage progress.
Run the VAST CLI command
cluster modifywith the
vcli: admin> cluster modify --enable-similarity
You are prompted to choose if you want to run a rewrite on the cluster's current data:
Similarity does not affect data that was written before it is enabled. Do you want to run a rewrite of current data? [Y/n]
Enter 'y' to confirm that you want to proceed or 'n' if you do not want to proceed.
If you proceed, you are then warned:
Rewrite may impact workloads while it is in progress. Stopping rewrite requires support intervention. DBox expansion will not be available during rewrite. Are you sure you want to proceed? [Y/n]
If you want to run a rewrite, enter 'y' to confirm again. Otherwise, enter 'n' to cancel the rewrite.
If you chose the rewite, it now begins.
You can now monitor the progress of the rewrite. Enter the command
cluster show. The command output includes the following fields:
Rewrite-phase. During the rewrite, one of the main phases appears here. The order of the phases is:
Rewrite-progress. This shows the percentage progress of the current phase of the rewrite. When it reaches 100 for the final phase, Similarity is fully enabled.
If a rewrite is running following enablement of Similarity, it is not recommended to disable Similarity while the rewrite is still in progress. Disabling Similarity during rewrite causes the rewrite process to run indefinitely although it will stop applying similarity to the data.
To disable similarity-based data reduction after installation at any time, connect to the VAST CLI and run the command
cluster modify --disable-similarity.