Note
Comprehensive documentation for Boto 3 can be found on Boto 3's official documentation site. Please use the official Boto 3 documentation as needed in conjunction with our specific guidelines for using Boto 3 with VAST Cluster.
Boto 3 is an SDK for python that enables you to interact with AWS services through python code, including the S3 service. Boto 3 offers two ways to interface with the S3 APIs:
You can use either interface to access the S3 APIs that VAST Cluster supports.
In both cases, when you create the client or resource, it creates a default session which manages configuration and state. The session can and will look for configurations in various places including several configuration files. However, configuration files do not allow you to specify an endpoint URL, so we will show you how to pass all configuration parameters, including the S3 endpoint, S3 user credentials, and other settings, in the client or resource creation call.
Install the SDK
Start by installing the Boto 3 SDK for python:
$ pip install boto3
Choose an interface
To connect to the low-level client interface, use Boto3’s client() method. You must pass your VAST Cluster S3 credentials and other configurations as parameters with hardcoded values. This is the only way to specify a VAST Cluster VIP as the S3 endpoint.
The following example imports the boto module and instantiates a client with the minimum configuration needed for connecting the client to your VAST Cluster S3 account over an HTTP connection:
import boto3 s3_client = boto3.client( 's3', use_ssl=False, endpoint_url='<ENDPOINT-URL>' aws_access_key_id='<ACCESS-KEY>', aws_secret_access_key='<SECRET-KEY>' region_name='<REGION>' config=boto3.session.Config( signature_version='s3v4' s3={'addressing_style': 'path'} ) )
in which:
-
<ENDPOINT-URL>
can be any of the cluster's Virtual IPs, prefixed by http://. For example,http://198.51.100.255
, in which 198.51.100.255 is one of the cluster's VIPs.Note
To retrieve the cluster's virtual lPs:
-
In the VAST Web UI, open the menu (click
), select Configuration and then select the Virtual IPs tab. The Virtual IPs list shows you which virtual IPs are configured on each CNode.
-
In the VAST CLI, run the
vip list
command.
-
-
<ACCESS-KEY>
and<SECRET-KEY>
are your S3 key pair. -
<REGION>
can be any string. It is required ifsignature_version=S3v4
.
For an HTTPS connection, pass parameters as follows in the client() call:
-
Enable HTTPS by setting
use_ssl=True
instead ofuse_ssl=False
. -
If the default certificate trust store does not recognize the signer of the installed certificate, you can use the
verify
parameter to specify a non default path to the certificate trust store. If you're using a self signed certificate, you can point this to the certificate itself. For example:verify="path/to/client/cert.pem"
-
Alternatively, you can use the
verify
parameter to disable verification:verify=False
Once you have an instance of the S3 service client, you can call the create_bucket method on the client instance to create a bucket.
In this example, we create a bucket called mybucket.
response = s3_client.create_bucket( Bucket='mybucket' )
The list_buckets() method returns a list of all buckets owned by the authenticated sender of the request.
The list_objects() method returns some or all (up to 1000) of the objects in a bucket. You can use the request parameters as selection criteria to return a subset of the objects in a bucket.
This example retrieves the list of objects in the bucket "mybucket".
response = s3_client.list_objects( Bucket='mybucket', )
The head_bucket() method is used to determine if a bucket exists and if the user has permission to access it.
The delete_bucket() method deletes a bucket. All objects in the bucket must be deleted before the bucket can be deleted.
Before setting ACL permissions, we recommend you read S3 Access Control Lists (ACLs).
The put_bucket_acl () method sets the permissions on a bucket using access control lists (ACL).
To grant permission to a user, specify CanonicalUser as the 'Type' and pass the user's VID as the 'ID'.
Tip
The VID can be found by running the user list
VAST CLI command. For information about connecting to the VAST CLI, see Connecting to the VAST CLI.
To grant permission to a predefined group, specify Group as the 'Type' and pass the group's URI as the 'URI':
-
For the All Users group: 'http://acs.amazonaws.com/groups/global/AllUsers'
-
For the Authenticated Users group: 'http://acs.amazonaws.com/groups/global/AuthenticatedUsers'
In this example, a user with VID 3 is granted full control permission to the bucket my_bucket owned by JDoe whose VID is 2.
response = s3_client.put_bucket_acl( AccessControlPolicy={ 'Grants': [ { 'Grantee': { 'ID': '54', 'Type': 'CanonicalUser', }, 'Permission': 'FULL_CONTROL' }, ], 'Owner': { 'DisplayName': 'BSmith', 'ID': '4' } }, Bucket='BobsBucket', )
In the following example, the Authenticated_Users group is granted READ permission on the bucket BobsBucket.
response = s3_client.put_bucket_acl( AccessControlPolicy={ 'Grants': [ { 'Grantee': { 'Type': 'Group', 'URI': 'http://acs.amazonaws.com/groups/global/AuthenticatedUsers' }, 'Permission': 'READ' }, ], 'Owner': { 'DisplayName': 'BSmith', 'ID': '4' } }, Bucket='BobsBucket', )
The get_bucket_acl() method retrieves the ACL of a bucket.
To learn about VAST Cluster's support for S3 ACLs, read S3 Access Control Lists (ACLs).
The copy_object() method creates a copy of an object already stored on the server.
The get_object() method retrieves an object.
To download a specified range of bytes of an object, use the Range parameter. For more information about the HTTP Range header, go to http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35.
In this example, we download only bytes 32-64 of the object "MyObject" from the bucket "MyBucket".
response = S3_client.get_object( Bucket='MyBucket', Key='MyObject', Range='bytes=32-64', )
The head_object() method retrieves metadata from an object without returning the object itself.
The delete_objects() method deletes multiple objects in a bucket.
response = s3_client.delete_objects( Bucket='mybucket', Delete={ 'Objects': [ {'Key': 'file1'}, {'Key': 'file2'}, {'Key': 'file3'}, ], }, )
Before setting ACL permissions, we recommend you read S3 Access Control Lists (ACLs).
The put_object_acl() method sets the permissions on an object using access control lists (ACL).
Syntax NotesTo grant permission to a user, specify CanonicalUser as the 'Type' and pass the user's VID as the 'ID'.
Tip
The VID can be found by running the user list
VAST CLI command. For information about connecting to the VAST CLI, see Connecting to the VAST CLI.
To grant permission to a predefined group, specify Group as the 'Type' and pass the group's URI as the 'URI':
-
For the All Users group: 'http://acs.amazonaws.com/groups/global/AllUsers'
-
For the Authenticated Users group: 'http://acs.amazonaws.com/groups/global/AuthenticatedUsers'
In this example, a user with VID 3 is granted full control permission to the object my_object in the bucket my_bucket owned by JDoe whose VID is 2.
response = client.put_object_acl( AccessControlPolicy={ 'Grants': [ { 'Grantee': { 'ID': '3', 'Type': 'CanonicalUser', }, 'Permission': 'FULL_CONTROL' }, ], 'Owner': { 'DisplayName': 'JDoe', 'ID': '2' } }, Bucket='my_bucket', Key='my_object', )
In this example, the predefined AUTHENTICATED_USERS group is granted WRITE permission to the object my_object in the bucket my_bucket owned by JDoe whose VID is 2..
response = client.put_object_acl( AccessControlPolicy={ 'Grants': [ { 'Grantee': { 'Type': 'Group', 'URI': 'http://acs.amazonaws.com/groups/global/AuthenticatedUsers' }, 'Permission': 'WRITE' }, ], 'Owner': { 'DisplayName': 'JDoe', 'ID': '2' } }, Bucket='my_bucket', Key='my_object', )
The get_object_acl() method returns an object's ACL.
To learn about VAST Cluster's support for S3 ACLs, read S3 Access Control Lists (ACLs).
The create_multipart_upload() method initiates a multipart upload and returns an upload ID.
After initiating the multipart upload, you then need to upload all parts and then complete the upload.
The abort_multipart_upload() method aborts a multipart upload after it was initiated.
After a multipart upload is aborted, no additional parts can be uploaded using the upload ID of that multipart upload. The storage consumed by any previously uploaded parts will be freed. However, if any part uploads are currently in progress, those part uploads might or might not succeed. As a result, it might be necessary to abort a given multipart upload multiple times in order to completely free all storage consumed by all parts.
The complete_multipart_upload() method completes a multipart upload by assembling previously uploaded parts.
The upload_part() method uploads a part in a multipart upload that was already initiated.
After uploading all parts, the upload needs to be completed.
The upload_part_copy() method uploads a part of a multipart upload by copying data from an existing object as data source.
The list_parts() method lists the parts that have been uploaded for a specific multipart upload.
The get_bucket_versioning() method returns the versioning state of a bucket. To retrieve the versioning state of a bucket, you must be the bucket owner.
The list_object_versions() method returns metadata about all versions of the objects in a bucket. You can also use request parameters as selection criteria to return metadata about a subset of all the object versions.
The S3 resource class has actions that are S3 service operations that pre-fill some of the parameters for you.
Actions can return three types of responses:
-
A low level response that is a python dictionary which you're responsible for handling
-
A resource
-
A list of resources
Sub-resources are actions that create new instances of child resources. The parent resource’s identifiers get passed to the child resource. The S3 service resource has several sub resources, notably bucket and object, which enable you to construct buckets and objects in the buckets.
Some resources have collections, which are groups of resources , allowing you to iterate, filter and manipulate resources. For example, a bucket can have a collection of objects that you can iterate through to access information such as which objects are in the bucket.
To connect to the S3 service using a resource, import the Boto3 module and then call Boto3's resource() method, specifying 's3' as the service name to create an instance of an S3 service resource. You must pass your VAST S3 credentials and other configurations as parameters into the resource() method. See the Boto3 resource() documentation page for a complete list and description of the configuration parameters that can be passed in the resource() method. See also S3 Service Resource page.
The following parameters must be passed in the resource() call since the endpoint URL and access key pair must be hardcoded:
-
endpoint_url (string). The endpoint URL can be any of the cluster's Virtual IPs, prefixed by http:// if you are disabling SSL or https:// if you are enabling SSL. For example,
http://198.51.100.255
, in which 198.51.100.255 is one of the cluster's VIPs.Note
To retrieve the cluster's virtual lPs:
-
In the VAST Web UI, open the menu (click
), select Configuration and then select the Virtual IPs tab. The Virtual IPs list shows you which virtual IPs are configured on each CNode.
-
In the VAST CLI, run the
vip list
command.
-
-
aws_access_key_id (string) and aws_secret_access_key (string). Pass these parameters your S3 key pair values.
If you are passing signature_version=S3v4
, then you must pass region_name (string). You can use any string value for the region name.
For an HTTPS connection, pass parameters as follows in the resource() call:
-
Enable HTTPS by setting
use_ssl=True
. -
If the default certificate trust store does not recognize the signer of the installed certificate, you can use the
verify
parameter to specify a non default path to the certificate trust store. If you're using a self signed certificate, you can point this to the certificate itself. For example:verify="path/to/client/cert.pem"
-
Alternatively, you can use the
verify
parameter to disable verification:verify=False
In this example, some optional advanced session configurations are passed in. The endpoint URL, S3 user access key and secret key are hard coded, as needed. SSL is disabled, so the connection will be over HTTP.
# Import the boto3 module import boto3 # Set advanced configuration parameters boto_params = {'s3': {'addressing_style': 'path'}, 'signature_version': 's3v4'} # Create a resource to the Amazon S3 service, passing configurations and credentials as parameters s3_resource = boto3.resource('s3', use_ssl=False, endpoint_url=<ENDPOINT-URL>, aws_access_key_id=<ACCESS-KEY>, aws_secret_access_key=<SECRET-KEY>, region_name=<REGION>, config=boto3.session.Config(**self.boto_params), )
Note
All further examples on this page assume an S3 service resource named s3_resource was created.
The buckets collection is the collection of all buckets on the S3 server and it has a method all() that creates an iterable of all the bucket resources in the collection. To return all the bucket resources on the server, you can Iterate through all the buckets returned by buckets.all().
The bucket resource has an identifier name and an attribute creation_date.
In this example, we iterate through all the buckets on the server. We print out the name of each bucket using the identifier name, which must be set for each bucket.
for bucket in s3_resource.buckets.all(): print (bucket.name)
Once you have an instance of the S3 service resource, you can call the create_bucket action on the resource to create a bucket. This returns a new bucket resource.
The operation is idempotent, so it will either create the bucket or return the existing bucket if it already exists.
Note
You can optionally set ACL for the bucket while creating the bucket by passing optional parameters. For details of ACL support, see S3 Access Control Lists (ACLs). For request syntax details, see the boto3 documentation for create_bucket.
In this example, we create a bucket called mybucket as a child resource of the S3 service resource s3_resource.
s3_resource.create_bucket(Bucket="mybucket")
The collection bucket.objects is the collection of all objects in a bucket and it has a method called all() that creates an iterable of all the object resources in the collection. To list all the objects in a bucket, you can iterate through the list of objects returned by bucket.objects.all().
The object resource has several identifiers and attributes that you could access this way, listed in the object resource documentation
In this example, we instantiate an S3 bucket by its name "the bucket", and iterate through all of the objects in the bucket. We print out the name of each object, using the key identifier, and the last date each object was modified, using the last_modified attribute.
s3_resource.Bucket("the_bucket") for s3_object in bucket.objects.all(): print(obj.key, obj.last_modified)
To delete a bucket, call the delete() action on the bucket resource.
Before setting ACL permissions, we recommend you read S3 Access Control Lists (ACLs).
Call the put() action on a bucket's BucketAcl resource to set ACL permissions on the bucket.
In this example, a user with VID 3 is granted full control permission to the bucket my_bucket owned by JDoe whose VID is 2.
s3_resource.BucketAcl ('my_bucket').put( AccessControlPolicy={ 'Grants': [ { 'Grantee': { 'ID': '3' , 'Type': 'CanonicalUser' }, 'Permission': 'FULL_CONTROL' }, ], 'Owner': { 'DisplayName': 'JDoe', 'ID': '2' } }, )
You can use the grants attribute of the ACL resource to access grantee information, and the owner attribute of the ACL resource to access owner information.
In this example, we iterate through the ACL grants and print out the user name and permission of each grantee who has permission to access the bucket.
acl = bucket.Acl() for grant in acl.grants: print(grant['Grantee']['DisplayName'], grant['Permission'])
You can call the put_object() action on the bucket resource to add an object to the bucket.
In this example, we add a file called "object_name" with a body "object data" to the "the_bucket" bucket, specifying a checksum value to store with the object. {is it correct to put that "b" in by "object data"? syntax is Body=b'bytes'|file, - check what that means}
s3_resource.Bucket("the_bucket").put_object(Key="object_name", Body=b"object data", Metadata={ 'checksum': '1234567890'})
The copy_from() action on the object resource creates a copy of an object already stored on the server.
In this example, we create a new object called file2 in bucket bucket2 by copying an object called file1 that is stored in bucket bucket1. CopySource is the only required parameter for copy_from(). In this case, we are specifying the source in string format. See copy_from() for full syntax details.
s3_resource.Object('bucket2','file2').copy_from(CopySource='bucket1/file1')
The get() action on the object resource retrieves an object.
Use the Range parameter to download a specified range of bytes of an object. For more information about the HTTP Range header, go to http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35.
In this example, we retrieve the object "my_object" from the bucket "my_bucket" by calling the get() action on the bucket. The response is a python dictionary, inside which is the 'Body', which is the body data in the format Streamingbody(). We read the content of the object by calling the read() function on the 'Body'.
s3_resource.Object("my_bucket", "my_object").get()['Body'].read()
In this example, we download only the byte range 32-64 of the object "my_object" from the bucket "my_bucket".
s3_resource.Object("my_bucket", "my_object").get(Range='bytes=32-64')['Body'].read()
To retrieve metadata from an object without returning the object itself, use the metadata attribute on the object resource.
To delete multiple objects in a bucket, call the delete_objects() action on the bucket resource.
s3_resource.Bucket("the_bucket").delete_objects( Delete={ 'Objects': [ {'Key': 'file1'}, {'Key': 'file2'}, {'Key': 'file3'}, ], }, )
Before setting ACL permissions, we recommend you read S3 Access Control Lists (ACLs).
To learn about VAST Cluster's support for S3 ACLs, read S3 Access Control Lists (ACLs).
The ObjectAcl() method on the S3 service resource creates a resource representing an S3 object's ACL. Alternatively, you can create the ObjectAcl resource by calling the Acl() method on the object resource. The put() action on an ObjectAcl resource sets the ACL permissions for an object that already exists in a bucket.
Syntax NotesTo grant permission to a user, specify CanonicalUser as the 'Type' and pass the user's VID as the 'ID'.
To grant permission to a predefined group, specify Group as the 'Type' and pass the group's URI as the 'URI':
-
For the All Users group: 'http://acs.amazonaws.com/groups/global/AllUsers'
-
For the Authenticated Users group: 'http://acs.amazonaws.com/groups/global/AuthenticatedUsers'
In this example, a user with VID 3 is granted full control permission to the object my_object in the bucket my_bucket owned by JDoe whose VID is 2.
s3_resource.ObjectAcl('my_bucket', 'my_object').put(AccessControlPolicy = { 'Grants': [ { 'Grantee': { 'ID': '3' , 'Type': 'CanonicalUser' }, 'Permission': 'FULL_CONTROL' }, ], 'Owner': { 'DisplayName': 'JDoe', 'ID': '2' } }, )
In this example, the predefined AUTHENTICATED_USERS group is granted WRITE permission to the object my_object in the bucket my_bucket owned by JDoe whose VID is 2..
#attempting group permission example - guessing to specify group name as "displayname" and that the name I used is sufficient and we don't need the long version s3_resource.ObjectAcl('my_bucket', 'my_object').put(AccessControlPolicy = { 'Grants': [ { 'Grantee':{ 'Type ':'Group' 'URI': 'http://acs.amazonaws.com/groups/global/AuthenticatedUsers' }, 'Permission':'WRITE' }, ], 'Owner': { 'DisplayName': 'JDoe', 'ID': '2' } }, )
You can use the grants attribute of the ACL resource to access grantee information, and the owner attribute of the ACL resource to access owner information.
In this example, we iterate through the ACL grants and print out the user name and permission of each grantee who has permission to access the object.
acl = object.Acl() for grant in acl.grants: print(grant['Grantee']['DisplayName'], grant['Permission'])
Call the initiate_multipart_upload() action on the object resource to initiate a multipart upload and return an upload ID. After initiating the multipart upload, you then need to upload all parts and then complete the upload.
To abort a multipart upload after it was initiated, call the abort() action on the MultipartUpload resource.
To complete a multipart upload after it was initiated and all parts were uploaded, call the complete() action on the MultipartUpload resource. Completing a multipart upload assembles the previously uploaded parts.
To upload a part in a multipart upload that was already initiated, create a MultipartUploadPart sub resource of the MultipartUpload resource and call the upload() action on the MultipartUploadPart resource.
After uploading all parts, the upload needs to be completed.
To upload a part of a multipart upload by copying data from an existing object as data source, create a MultipartUploadPart sub resource of the MultipartUpload resource and call the copy_from() action on the MultipartUploadPart resource.
The collection parts available on a MultipartUpload resource is the collection of all parts in the multipart upload. It has a method called all() that creates an iterable of all the parts in the multipart upload. To list all parts in a multipart upload, you can iterate through the list of parts returned by multipart_upload.parts.all().
The collection multipart_uploads is the collection of all multipart uploads in a bucket and it has a method called all() that creates an iterable of all the multipart uploads in the collection. To list all currently initiated multipart uploads in a bucket, you can iterate through the list of multipart uploads returned by bucket.multipart_uploads.all().
Comments
0 comments
Article is closed for comments.