As with any storage system, there is a distinction in VAST between usable capacity and raw physical capacity. VAST reports physical capacity and logical capacity but not usable capacity - although logical capacity does converge to usable capacity once data reduction is considered. The product documentation explains the distinction between logical and physical capacity in the Dashboard, but that may leave some questions unanswered. In this article we go into a bit more depth on physical vs. usable capacity and how that relates to logical capacity.
Usable Capacity
When considering usable capacity, one starts with the raw physical capacity of the underlying physical storage and then subtracts overhead:
- Data Protection - VAST uses large erasure coding/RAID stripes to protect data. Parity blocks introduce overhead. The data protection overhead varies by the size of the erasure coding stripe which in turn varies based upon the number of storage enclosures (D-boxes) in a cluster. This overhead is shown incrementally as new data is written - by increasing the physical space used by more than was written after data reduction. The overhead varies between 2% and 11% as VAST varies the number of data blocks in the stripe between 36 and 146 - the number of parity blocks is always 4. In this article we will write that as D+P where D is the number of data blocks (36-146) and P is the number of parity blocks (4) per stripe. P is the overhead.
- Reserve - While VAST is not a log based file system in the traditional sense, VAST needs to keep a certain amount of physical space free in order to enable its background activities. The reserve will drop with each version of VAST. Eventually our objective is to reduce the overhead to just 8%. In keeping with that objective, the reserve is shown in two parts - at initial cluster bringup by reducing physical capacity shown by 8% and over time by increasing the physical space used by each write by an additional percentage that depends on the release. The second value is shown incrementally because the goal is to reduce it to zero eventually. The release specific additional percentages are:
- Before VAST 2.1: 7% for a total of 15%
- 2.1.x and later: 4.5% for a total of 12.5%
- 3.0.x and later: 3% for a total of 11%
- 3.2.x and later: 1% for a total of 9%
A side effect of changing the reserve overhead with each release is that the physical used space is reduced upon upgrade to account for the reduction in overhead with that release.
The fixed initial space reservation can be easily shown as in this diagram:
This is what is shown as the physical space upon cluster initialization. Please keep in mind that VAST does not report usable capacity. The tables below are to help you plan how much you can store on a VAST system.
In tabular form, here is what the above means in terms of usable capacity - assuming no data reduction:
Releases Prior to VAST 2.1 | TB Usable | ||||||
Per Storage Enclosure | Total System | ||||||
# of D-boxes | Data Protection Stripe Size | Protection Overhead | Reserve Overhead | Usable Percentage Before Data Reduction | 440TB TLC | 675TB QLC | 675TB QLC |
1 | 36+4 | 10% | 15% | 75% | 330 | 500 | 506 |
2 | 76+4* | 5% | 15% | 80% | 352 | 540 | 1080 |
3 | 116+4 | 3.3% | 15% | 81.7% | 359 | 551 | 1654 |
4 | 146+4 | 2.7% | 15% | 82.3% | 362 | 556 | 2222 |
4 or more | 146+4 | 2.7% | 15% | 82.3% | 362 | 556 | (# D-box) X 556 |
VAST 2.1 | TB Usable | ||||||
Per Storage Enclosure | Total System | ||||||
# of D-boxes | Data Protection Stripe Size | Protection Overhead | Reserve Overhead | Usable Percentage Before Data Reduction | 440TB TLC | 675TB QLC | 675TB QLC |
1 | 36+4 | 10% | 12.5% | 77.5% | 341 | 523 | 523 |
2 | 76+4 | 5% | 12.5% | 82.5% | 363 | 556 | 1114 |
3 | 116+4 | 3.3% | 12.5% | 84.2% | 370 | 568 | 1704 |
4 | 146+4 | 2.7% | 12.5% | 84.8% | 373 | 572 | 2291 |
4 or more | 146+4 | 2.7% | 12.5% | 84.8% | 373 | 572 | (# D-box) X 572 |
VAST 3.0 | TB Usable | ||||||
Per Storage Enclosure | Total System | ||||||
# of D-boxes | Data Protection Stripe Size | Protection Overhead | Reserve Overhead | Usable Percentage Before Data Reduction | 440TB TLC | 675TB QLC | 675TB QLC |
1 | 36+4 | 10% | 11% | 79% | 348 | 533 | 533 |
2 | 76+4 | 5% | 11% | 84% | 370 | 567 | 1134 |
3 | 116+4 | 3.3% | 11% | 85.7% | 377 | 578 | 1735 |
4 | 146+4 | 2.7% | 11% | 86.3% | 380 | 583 | 2331 |
4 or more | 146+4 | 2.7% | 11% | 86.3% | 380 | 583 | (# D-box) X 583 |
VAST 3.2 | TB Usable | ||||||
Per Storage Enclosure | Total System | ||||||
# of D-boxes | Data Protection Stripe Size | Protection Overhead | Reserve Overhead | Usable Percentage Before Data Reduction | 440TB TLC | 675TB QLC | 675TB QLC |
1 | 36+4 | 10% | 9% | 81% | 356 | 547 | 547 |
2 | 76+4 | 5% | 9% | 86% | 378 | 580 | 1161 |
3 | 116+4 | 3.3% | 9% | 87.7% | 386 | 592 | 1776 |
4 | 146+4 | 2.7% | 9% | 88.3% | 389 | 596 | 2384 |
4 or more | 146+4 | 2.7% | 9% | 88.3% | 389 | 596 | (# D-box) X 596 |
Note: half populated storage enclosures have slightly different overhead values. Consult your VAST representative for the specifics.
Logical Capacity
The actual amount of that can written depends heavily on Data Reduction. Data reduction reduces the amount of physical space consumed by data which results in an increase in the logical space available for the data. VAST uses three techniques for data reduction: global compression, global deduplication, and very soon global similarity compression. Data reduction reduces the amount of data written to some amount below what is logically written. Data reduction is reported in VAST and readily visible in the dashboard - the value reported does not consider the overheads mentioned earlier that will slightly increase what is written to disk. In most cases data reduction exceeds the overheads by a large margin.
Logical capacity is a prediction of how much data can be stored assuming similar future data reduction ratios and no change in the overheads. Logical capacity is a much better predictor over time of the amount of data that can be stored in VAST than physical or usable capacity. The only caveat is that one must first store enough representative data into the system before the logical capacity prediction is meaningful.
The following diagram shows the relationship between logical and physical. The most important take away is that the total logical space is calculated dynamically based upon the actual logical used, physical used, and total physical space. Essentially the ratio of logical used to total logical space must always be the same as physical used to total physical space. Logical free is adjusted accordingly: logical free = logical space - logical used.
Example
The following diagram summarizes the VAST behavior with respect to the initial space reservation of 8%, data reduction, erasure coding, and incremental space reservation (4.5% today, 3% soon, and declining more in the future) for a single 1GB write:
We close with an example that should make this more concrete. Suppose a cluster contains four 675TB storage enclosures. The total physical space in VMS per enclosure will be reported as 92% of 675TB = 621TB - as mentioned earlier, that is the initial space reservation for internal overhead. Obviously for four enclosures the space reported is 621*4 = 2484TB. Then when a write of 1GB occurs that we will assume can be reduced by 50% using data reduction, the following will happen assuming a cluster with at least four full sized storage enclosures (enabling 146+4 stripes):
- The application logically writes 1GB and standard file system APIs report that 1GB was written
- Data reduction will reduce the actual data to be written to 500MB
- Erasure coding overhead will add in 2.8% (4/146) more physical space to be used
- Reserve overhead will add in 4.5% (3% with 3.0) more physical space to be used
- Total bytes written is then 500MB x (100% + 2.8% + 4.5%) = 536.5MB
- Physical space used will increase by 536.5MB and physical space free will decrease by 536.5MB
- Logical space used will increase by 1GB
- Logical space free will be adjusted to reflect the new predicted logical storage space remaining.
Comments
0 comments
Please sign in to leave a comment.