Prerequisites
- Linux OS (Ubuntu 20/CentOS 8 Recommended)
- Ubuntu 18 & CentOS 7 Minimum Required
- python3
- Docker
- screen
- wget
Sizing The Probe Hardware or Virtual Machine
Review Hardware Requirements
-
RAM is reserved for the operating system and the probe runtime image
- 20% is set automatically aside by the probe launcher
-
RAM is also used for the file name cache
- This is 50 bytes per file scanned
-
RAM or SSD-backed local disk is used for a hash database
- The hash database is where the probe tracks scanned blocks
- The hash database should be no less than 0.6% of the data size
- If there is sufficient RAM, the probe will use RAM for the hash database
- If there is insufficient RAM, the probe will use SSD-backed local disk for the hash database
- Any remaining un-allocated RAM will be used for a read-cache
Download Probe Bundle
To download the probe, refer to the instructions in Downloading the VAST Probe.
If you do not have access to the instructions, please contact your VAST representative. They can provide download instructions.
Expand & Verify Download
Now that you've downloaded the probe, you'll need to untar it and then verify the download is correct.
export PROBE_BUILD=935553
tar -xzf ${PROBE_BUILD}.probe.bundle.tar.gz
ls -l
Note: image may not show current build numbers.
Mount Filesystems Selected to Be Probed
Validated Filesystems Include, But Are Not Limited To:
- NFS
- Lustre
- GPFS
- S3 with goofys
- CIFS/SMB
For the most accurate results, do not use root-squash
It's recommended to set read-only access on the mounted filesystem
Create Probe Directories
Change /mnt/ to the SSD-backed local disk to be used by the probe for the hash database and logging directories
sudo mkdir -p /mnt/probe/db
sudo mkdir -p /mnt/probe/out
sudo chmod -Rf 777 /mnt/probe
Size of the Data Set
- The input to the probe is a defined directory (--input-dir)
- The probe will automatically query the input filesystem about space consumed and file count (inodes) and use that in its calculations
- Depending on the method of mounting and underlying storage, this can often provide an inaccurate query response
- It's highly recommended that manual estimated entries be defined for space consumed (--data-size-gb) and file count (--number-of-files)
- These estimates do not have to be accurate, round up reasonably
Running The Probe
The probe runs as a foreground application. This means that if your session is closed for whatever reason, the probe will stop. It's recommended running the probe as a screen session.
Here is an example of a command line. Edit the bold variables for the environment:
NOTE: Use underscores instead of spaces in COMPANY_NAME and WORKLOAD
export DB_DIR=/mnt/probe/db
export OUTPUT_DIR=/mnt/probe/out
export INPUT_DIR=/mnt/filesystem_to_be_probed/sub_directory
export INPUT_SIZE_GB=10000
export QTY_FILES=1000000
export COMPANY_NAME=Your_Amazing_Company
export WORKLOAD=Describe_Your_Workload
Start the probe: (This may take up to five minutes to start displaying output)
sudo python3 ./probe_launcher.py \
--probe-image-path ${PROBE_BUILD}.probe.image.gz \
--input-dir $INPUT_DIR \
--metadata-dir $DB_DIR \
--output-dir $OUTPUT_DIR \
--data-size-gb $INPUT_SIZE_GB \
--number-of-files $QTY_FILES \
--customer-name ${COMPANY_NAME}---${WORKLOAD}
Example One: Small Data Sets
To probe the directory interesting_data of 15 TB in-use and 5,000,000 files at the company ACME, the command would be:
sudo python3 ./probe_launcher.py \
--probe-image-path ${PROBE_BUILD}.probe.image.gz \
--input-dir /mnt/acme_filer/interesting_data \
--metadata-dir /mnt/data/probe/db \
--output-dir /mnt/data/probe/out \
--data-size-gb 15000 \
--number-of-files 5000000 \
--customer-name ACME---Interesting_Data
Example Two: Larger Data Sets
To probe the directory fascinating_data of 60 TB in-use and 750,000,000 files at the company FOO, and are using defined parameters for RAM and SSD-backed local disk the command would be:
sudo python3 ./probe_launcher.py \
--probe-image-path ${PROBE_BUILD}.probe.image.gz \
--input-dir /mnt/foo_filer/fascinating_data \
--metadata-dir /mnt/data/probe/db \
--output-dir /mnt/data/probe/out \
--data-size-gb 60000 \
--number-of-files 750000000 \
--customer-name FOO---Facinating_Data
Example Three: Performance Throttling
To probe the directory riviting_data of 250 TB in-use and 1,250,000,000 files at the company Initech, using defined parameters for RAM and SSD-backed local disk, but wish to have a lower performance impact on the filesystem, the command would be:
sudo python3 ./probe_launcher.py \
--probe-image-path ${PROBE_BUILD}.probe.image.gz \
--input-dir /mnt/initech_filer/riviting_data \
--metadata-dir /mnt/data/probe/db \
--output-dir /mnt/data/probe/out \
--data-size-gb 250000 \
--number-of-files 1250000000 \
--number-of-threads 4
--customer-name Initech---Riviting_Data
Note the --number-of-threads flag. By default the probe will use all CPU cores in the system but this can be used to throttle performance and reduce potential impact of the scanned filesystem.
Other Probe Flags
While the probe is running and after completion, telemetry logs are automatically uploaded to VAST. To prevent this, add the following flag:
--dont-send-logs \
If you wish to send file names with the default telemetry logs, add the following flag:
--send-logs-with-file-names \
Probing filesystems which contain snapshots can often cause recursion issues and inaccurate results. As a result the probe automatically ignores directories named .snapshot. If your file system uses another convention, use the --regexp-filter command. If for some reason you want the probe to read the .snapshot directories, specify false rather than true for --filter-snapshots.
--filter-snapshots \ (this is the default)
Adaptive chunking was introduced with VAST 4.3 and this latest ( ) probe version. Under most circumstances the probe should be run with adaptive chunking. However you can disable that feature by specifying this flag:
--disable-adaptive-chunking \
Understanding the Results
Once started, the probe will display the current projection of potential data reduction. Once completed, the probe will display output and is further described in Understanding VAST Probe Output
Re-Running The Probe
The hash database must be empty before running the probe again:
sudo rm -r /mnt/probe/db/*
Troubleshooting
Refer to the VAST Probe Troubleshooting document and contact your VAST System Engineer assistance.
Comments
0 comments
Please sign in to leave a comment.