As with any storage system, there is a distinction in VAST between usable capacity and raw physical capacity. VAST reports physical capacity and logical capacity but not usable capacity - although logical capacity does converge to usable capacity once data reduction is considered. The product documentation explains the distinction between logical and physical capacity in Dashboard, but that may leave some questions unanswered. In this article we go into a bit more depth on physical vs. usable capacity and how that relates to logical capacity.
When considering usable capacity, one starts with the raw physical capacity of the underlying physical storage and then subtracts overhead:
- Data Protection - VAST uses large erasure coding/RAID stripes to protect data. Parity blocks introduce overhead. The data protection overhead varies by the size of the erasure coding stripe which in turn varies based upon the number of D-boxes in a cluster. This overhead is shown incrementally as new data is written - by reducing the physical space used by more than was written. It varies between 2% and 11% as VAST varies the number of data blocks in the parity stripe between 36 and 150 - the number of parity blocks is always 4. In this article we will write that as D+P where D is the number of data blocks (36-150) and P is the number of parity blocks (4) per RAID stripe.
- Garbage Collection - While VAST is not a log based file system in the traditional sense, VAST needs to keep a certain amount of physical space free in order to enable its background activities. Today the GC overhead is fixed at 15% but this number will drop with time as we enhance VAST. Eventually our objective is to reduce the overhead to just 8%. The GC overhead is shown in two parts - at initial cluster bringup by reducing physical capacity shown by 8% and over time by reducing physical space used by an additional 7%. When VAST lowers the overhead in future releases (the first phase is coming in 2.1), a software upgrade will increase the usable capacity.
Please keep in mind that VAST does not report usable capacity. The tables below are to help you plan how much you can store on a VAST system.
In tabular form, here is what the above means in terms of usable capacity:
|Releases Prior to VAST 2.1||TB Usable|
|Per D-box||Total System|
|# of D-boxes||RAID Stripe||RAID Overhead||GC Overhead||Usable Percentage Before Data Reduction||440TB TLC D-boxes||675TB QLC D-boxes||675TB QLC D-boxes|
|4 or more||150+4||2.7%||15%||82.3%||362||556||(# D-box) X 556|
* temporarily a 2 D-box QLC system will have a RAID stripe of 36+4 for a RAID overhead of 11%. That deficiency will be removed in VAST 2.0.7.
Later this year we plan to release VAST 2.1 with lower GC Overhead. We expect it to drop from 15% to 12.5%, resulting in this usable capacity:
|VAST 2.1||TB Usable|
|Per D-box||Total System|
|# of D-boxes||RAID Stripe Size||RAID Overhead||GC Overhead||Usable Percentage Before Data Reduction||440TB TLC D-boxes||675TB QLC D-boxes||675TB QLC D-boxes|
|4 or more||150+4||2.7%||12.5%||84.8%||373||572||(# D-box) X 572|
Keep in mind that the actual amount of you can write depends heavily on Data Reduction. Data reduction reduces the amount of physical space consumed by data which results in an increase in the logical space available for your data. VAST uses three techniques for data reduction: global compression, global deduplication, and soon global similarity compression. Data reduction reduces the amount of data written to some amount below what is logically written. Data reduction is reported in VAST - the value reported does not consider the overheads mentioned earlier that will slightly increase what is written to disk. In most cases data reduction exceeds the overheads by a large margin.
Logical capacity is a prediction of how much data you will be able to store assuming similar future data reduction ratios and no change in the overheads. Logical capacity is a much better predictor over time of the amount of data that you can store in VAST than physical or usable capacity. The only caveat is that you need to store enough representative data into the system before the logical capacity prediction is meaningful.
We close with an example that should make this more concrete. Suppose a cluster contains one 675TB D-box. The total physical space in VMS will be reported as 92% of 675TB = 621TB - as mentioned earlier, that is the GC overhead. Then when a write of 1GB occurs that we will assume can be reduced by 50% using data reduction, the following will happen:
- The application logically writes 1GB and normal file system APIs report that 1GB was written
- Data reduction will reduce the actual data to be written to 500MB
- RAID overhead will add in 11% more physical space to be used
- GC overhead will add in 7% (4.5% with 2.1) more physical space to be used
- Total bytes written is then 500MB x (100% + 11% + 7%) = 590MB
- Physical space used will increase by 590MB and physical space free will decrease by 590MB
- Logical space used will increase by 1GB
- Logical space free will be adjusted to reflect the new predicted storage space. If the previous DRR ratio was 2:1, then logical space free will be reduced by 1GB.