Elbencho is an open source benchmarking tool. It can be used for file and object storage systems, which is useful to test the multi-protocol access of a VAST system. Elbencho also makes it easy to run distributed tests across multiple clients, which is typically required to drive the full throughput of a VAST system.
Getting elbencho
Elbencho can be built from source, but that's not necessary. It's easier to just download the static binary, which runs on any Linux distribution without the need to even install an rpm/deb package: https://github.com/breuner/elbencho/releases/
Running elbencho with multiple clients
To run coordinated elbencho tests across multiple clients, you would first start it on the clients that you want to use for a test in "service mode" like this:
$ elbencho --service
Then the elbencho service just sits there idle in the background and waits for commands from a master instance. In addition to the normal benchmark parameters, the master instance will simply specify the hostnames of the service instances either directly on the command line like this:
$ elbencho --hosts node010,node011,node012,...
...or through a hostsfile, which contains the corresponding hostnames or IP addresses of the service nodes newline-separated:
$ elbencho --hostsfile /path/to/myhostsfile.txt
All examples below can also be used on a single client by just omitting the "--host"/"--hostsfile" arguments.
Simple max bandwidth tests for starters
Choosing the right number of clients
The exact amount of clients and threads needed to drive the full bandwidth of a VAST system depends on the client and server hardware. But very roughly speaking, a single VAST protocol server (CNode) can serve data at about 10GB/s (100Gbit), which gives you an indication for the number of clients needed for full bandwidth, depending on their network interconnect.
While there generally is more overhead in an individual S3 request in comparison to an NFS request (e.g. a single 4KB read via S3 is significantly more "expensive" than a single 4KB read via NFS), VAST sees all protocols as "first class citizens" of the architecture and thus enables high bandwidth also via S3.
General considerations for max bandwidth
Otherwise, the same rules apply for S3 benchmarking that also applied for other access protocols on VAST: It doesn't make sense to write only zeros (as many benchmarking tools would do by default), because the VAST system would deduplicate all of those writes into a single block. For that reason, elbencho generates non-reducible data by default, so no extra parameter needed for this.
Also, the workload needs to be nicely spread across the VAST CNodes, which you do by providing all VAST VIPs to elbencho via the "--s3endpoints" parameter.
The full read bandwidth of the VAST system will only be unleashed when the data has been migrated over from the SCM write buffer to the data drives. In a production system, this would continously happen in the background based on new data getting ingested into the system. But in a benchmarking environment, it's a good idea to write about 4TB of data per VAST NVMe enclosure (DBox) to ensure that the majority of data has been migrated.
The actual bandwidth test
A simple bandwidth can use a fixed number of large objects so that the same dataset can be used independent of the number of clients and independent of the number of threads per client that will later be used to read the data back. For this, the object names can be provided directly to elbencho as command line parameters.
The following example assumes that you have already created a bucket named "mybucket" and that you have generated a S3 access key and secret key pair for a user. Also, the 8 VIPs in this example need to be replaced by the actual range of VIPs of your VAST system. (If needed, you can also add "-d" to have elbencho create the bucket.)
This command will write ("-w") 256 objects in 16MiB blocks ("-b 16m"), each of the objects being 16GiB in size ("-s 16g"), for a total of 4TiB. The number of threads ("-t 48") is per-client.
$ elbencho --s3endpoints "$(echo http://172.200.203.{1..8})" --s3key="..." --s3secret="..." -w -t 48 -s 16g -b 16m --hostsfile myhosts.txt mybucket/bigobjects/file{1..256}
This dataset size of 4TiB is appropriate for a single VAST DBox. For more DBoxes, you would just linearly increase the object size (e.g. "-s 32g" for two DBoxes).
The same command, just with "-r" (read) instead of "-w" (write), can be used to read the data back.
In the S3 world, there is not really a concept of random writes, but at least the reads could be done from random offsets by adding the "--rand" parameter. And of course the dataset could also be read back using different thread counts, different client counts or different block sizes of interest (e.g. "-b 1m" for 1MiB reads).
Multiprotocol access
Since there is no real difference between a file and an object in a VAST system, you might be interested in trying to read your objects back as files. The corresponding elbencho command for the dataset generated above would look like this, assuming the NFS mountpoint "/mnt/vast/mybucket" refers to the S3 bucket from the previous section:
$ elbencho -r -t 48 -s 16g -b 16m --hostsfile myhosts.txt /mnt/vast/mybucket/bigobjects/file{1..256}
Multiple objects per thread
Specifying the object names directly on the command line as elbencho parameters only works for a relatively limited number of objects, such as the 256 in the examples above. For tests with significantly more objects, you would rather specify a certain number of subdirs and objects per subdir for each thread. In this case, you would only provide the bucket name as argument and add "-n" (number of subdirs per thread) and "-N" (number of files per subdir) as parameters, like this:
$ elbencho --s3endpoints "$(echo http://172.200.203.{1..8})" --s3key="..." --s3secret="..." -w -t 48 -s 16m -b 16m -n 5 -N 10 --hostsfile myhosts.txt mybucket
This command will make each thread create 10 subdirs ("-n 10"), inside which it will create 20 objects ("-N 20") of 16MiB in size each ("-s 16m"), each object uploaded in a single 16MiB request ("-b 16m"). With e.g. 4 clients, this would mean a total of 9600 objects: 4 clients x 48 threads_per_client x 5 subdirs_per_thread x 10 objects_per_subdir.
The dataset can be read back by using the same command with "-r" (read) instead of ("-w"), but since it has been created for a certain number of threads, it can only be read back with the same or a lower amount of threads.
And again, the same dataset could also be read back via NFS like this:
$ elbencho -r -t 48 -s 16m -b 16m -n 5 -N 10 --hostsfile myhosts.txt /mnt/vast/mybucket
Keeping results
By using elbencho's service mode, you will have a single aggregate result for all clients instead of having to gather individual results and checking that all clients really ran at the same time. Elbencho shows two result sets for each run: The aggregate end result ("last done", referring to the point in time when the slowest thread finished its work) and the aggregate "first done" result, referring to the point in time when the fastest thread/client finished its work. The phase between "first done" and "last done" is called the "tail" and is usually a phase of lower throughput based on the fact that fewer threads are active.
To preserve the human-readable results that are shown on the console, you can use the " --resfile /path/to/results.txt " parameter.
To write the end results into a csv file, you can use the " --csvfile /path/to/results.csv " parameter.
elbencho will append to result files and not overwrite them if they already exist. This can be useful e.g. to build graphs from throughput results with different object sizes or different block sizes from a csv file via spreadsheet applications.
You might also find the following command useful to view csv file contents on the console:
$ column -ts /path/to/results.csv | less
Additional notes and limitations for S3
- Different from the file world, in the S3 world objects only appear in the namespace when they are completely uploaded. That means during the upload you won't see the object if you list the bucket directory via S3 or NFS; and if you press CTRL+C in the middle of writing a large object, then elbencho will notify the S3 server to discard the partially uploaded object content of the current object.
- There are no subdirectories in the S3 world, but there is a concept of "separators" in object names, which can be used to group sets of objects within the same bucket together - conceptually similar to subdirectories. Not surprisingly, the slash ("/") is a commonly used separator in the S3 world. Thus, to bring everything nicely together with the file world, VAST systems interpret elements of object names with a trailing slash as directory names.
- In the S3 world, there are simple PUTs (i.e. upload of an object through a single HTTP request, which only makes sense for small or medium-sized objects) and multi-part uploads (i.e. upload through multiple HTTP requests, which makes sense for larger objects). If the given elbencho block size is equal to the given object size (e.g. "elbencho -b 16m -s 16m") then elbencho automatically uses a simple single PUT for the upload. If the block size is smaller than the object size (e.g. " elbencho -b 16m -s 1g ") then elbencho automatically uses multi-part upload.
- The block size (-b) needs to be allocated in RAM by each thread, hence it wouldn't be practical to upload e.g. a 1TB object without using multi-part upload.
- Amazon defined that a multi-part upload cannot have more than 10,000 parts. VAST implements this limitation. That means "elbencho -w -s 1g -b 4k mybucket/myobj1 --s3endpoints ..." would not work, because 1GiB divided by 4KiB is more than 10,000, so this would result in a multi-part upload of more than 10,000 parts. The same example with "-b 1m" would work, because it results in less than 10,000 parts being uploaded for the object.
- Amazon defined that an individual parts of a multi-part upload cannot be smaller than 5MiB. VAST does not implement this limitation. That means " elbencho -w -s 10m -b 1m ", would not work on most object stores, but would work on VAST. However, consequently typical S3 applications use at least 5MiB as blocksize when writing larger objects.
- For reads, the multi-part upload limitations do not apply, so e.g. reading a 1GiB file in 4KiB blocks would work, but due to the overhead of the S3 protocol for very small requests, it's probably something that normal S3 applications would try to avoid.
Comments
0 comments
Article is closed for comments.