Audience
This guide helps understand the Probe hardware and software requirements. This is intended for customers who are running the Probe on their own infrastructure. You can learn more about the details of how the Probe works by reviewing Running the Probe. You can learn how to quickly run the Probe by following the instructions in Vast Probe Quickstart.
Hardware Minimum Requirements
Actual hardware requirements depend on the amount of data to be scanned. Examples on how to scope hardware based on dataset size are provided at the end of this page.
- 16 CPU cores or higher Intel Broadwell-compatible or later CPUs
- The Probe requires CPU instructions that are not available on older CPUs
- The Probe will run virtually on Intel based hardware that has a Virtual Cluster vMotion minimum compatibility of Intel Broadwell-compatible or later
- The Probe has not been evaluated on AMD CPUs
- 128 GB RAM or higher
- The probe consumes almost 100GB of RAM upon launch
- The more RAM, the better the Probe will perform and the more data can be scanned
- 10 GbE Networking or higher
- 50 GB SSD-backed local storage or higher (NVMe or FC/iSCISI LUNs)
- This local SSD capacity is needed for the database the probe builds and logging
- Must be equivalent to 0.6% of the data to be scanned
- Disk storage must have very high sustained IOPs
- The larger the local SSD allocated, the more data can be scanned
- Local SSD filesystem should be ext4 or xfs
Operating System Minimum Requirements
We've tested the following, but most modern Linux distributions should be fine:
- Ubuntu 18.04, 20.04
- CentOS/RHEL 7.4+
- Rocky/RHEL 8.3+
Software Requirements
- Docker: 17.05 +
- python3 (for launching the Probe)
- screen (for running the Probe in the background)
- wget (for downloading the Probe image)
Filesystem Requirements (For Probing For Data Reduction)
Be aware that if the filesystem has atime enabled, any scanning, even while mounted as read-only will update the atime clock.
- NFS: The Probe host has be provided root-squash and read-only access
- For faster scanning, use an operating system that has nconnect support:
- Ubuntu 20.04+
- RHEL/Rocky 8.4+
- For faster scanning, use an operating system that has nconnect support:
- Lustre: The Probe host and container must be able to read as a root user
- GPFS: The Probe host and container must be able to read as a root user
- SMB: The Probe host should be mounted with a user in the BUILTIN\Backup Operators group to avoid file access issues.
- S3/Object: We have tested internally with Goofys as a method of imitating a filesystem
- It is not recommend to scan anything in AWS Glacier or equivalent
Hardware Requirement Examples
Example A: You have a server with 768GB of RAM:
- 154GB is for the Operating System, leaving 614GB of RAM...
- There are 100 million files to scan, that will occupy ~5GB of RAM, leaving 609GB of RAM...
- 50-bytes per 'filename'
- This leaves 609GB of RAM available for the RAM index
-
--ram-index-size-gb 609
- This can scan up to 99TB of data using just RAM and no significant local SSD space is needed
- This calculation is based on a 0.6% rule to accommodate similarity and deduplication hashes
-
- Use of a disk index you can scan far more data and the file count could exceed 10 billion with a 500GB file name cache
Example B: You have a server with 128GB of RAM and a Local SSD:
- 26GB is for the Operating System, leaving 102GB of RAM...
- There are 100 million files to scan, that will occupy ~5GB of RAM, leaving 97GB of RAM...
- 50-bytes per 'filename'
- This leaves 97GB of RAM available for the RAM index
-
--ram-index-size-gb 97
- This can scan up to 15TB of data using just RAM and no significant local SSD space is needed
- This calculation is based on a 0.6% rule to accommodate similarity and deduplication hashes
-
- Using a disk index you can scan far more data and the file count could be as high as 2 billion with a 100GB file name cache
- 15TB of data requires 90GB of local SSD disk
- 100TB of data requires 600GB of local SSD disk
Algorithm Specification
Here's pseudo-code which helps to explain how these calculations are done:
available_ram_bytes = (avail_b * 0.8) - (n_files * 50)
ram_index_size = args.ram_index_size_gb * GB
disk_index_size = args.disk_index_size_gb * GB
if disk_index_size == 0:
if ram_index_size == 0 and available_ram_bytes > index_size:
ram_index_size = index_size
if ram_index_size == 0:
disk_index_size = index_size
if 0 < ram_index_size < GB:
ram_index_size = GB
if 0 < disk_index_size < GB:
disk_index_size = GB
Comments
0 comments
Please sign in to leave a comment.