Wednesday, October 11, 2023

AWS: Simpler S3 File Deletes by Prefix

 I came across this code that deletes files in an S3 from a list of prefixes:

s3_client = boto3.client('s3')
for prefix in prefix_list:
    paginator = s3_client.get_paginator('list_objects_v2')
    file_list = paginator.paginate(
        Bucket=data_bucket,
        Prefix=prefix
    )
    for current_content in file_list:
        for current_file in current_content.get('Contents', []):
            current_key = current_file['Key']
            response = s3_client.delete_object(
                Bucket=data_bucket,
                Key=current_key
            )

The code creates an S3 client, and then, for each prefix in a list, it creates a paginator. Paginators are great because they help you avoid using all your memory when the list of files is big. Using this paginator, the code retrieves the list of all the files corresponding to the prefix, and deletes it.

Nothing bad in the code, it works nicely. My only remark here, is that there exists a simpler way. Instead of using the S3 client, you can create an S3 bucket resource. From there, you can simply delete all files listed under a prefix using a simple filter:

s3_resource = boto3.resource('s3')
bucket = s3_resource.Bucket(data_bucket)
for prefix in prefix_list:
    bucket.objects.filter(Prefix=prefix).delete()

Simpler!

No comments:

Post a Comment