Efficiently Encrypt Millions of S3 Objects: A Step-by-Step Guide Using AWS S3 Batch Operations and CMK

Introduction
Prerequisites
- Setting up inventory reports on S3 bucket
Generating the manifest file
- Setting up Glue resources
Conclusion
About the Author

Introduction

Keeping data secure is crucial in today’s digital landscape. In this blog post, we will explore how to leverage the power of AWS S3 Batch Operations and AWS Customer Master Keys (CMK) to efficiently encrypt millions of objects in your S3 buckets. By following the step-by-step guide provided, you’ll be able to enhance the security of your data and ensure compliance with industry standards. Let’s dive in!

Prerequisites

To get started with the encryption process, there are a few prerequisites you need to have in place. Make sure you have:

Created a CMK in AWS Key Management Service (KMS).
Set up an inventory report on the S3 bucket containing the objects that need to be re-encrypted using the CMK.
Assigned the required CMK to the S3 bucket.

The first and third steps above are quite simple, so lets go through the detailed steps for setting up an inventory report in Amazon S3

Setting up inventory reports on S3 bucket

Log in to the AWS Management Console and navigate to the Amazon S3 dashboard.
Select the S3 bucket that contains the objects you want to re-encrypt.
Click on the “Management” tab located at the top of the S3 bucket dashboard.
Scroll down to the “Inventory” section and click on “Create inventory configuration” to create a new inventory configuration.
In the “Create inventory configuration” window, provide the following information:

Report name: Choose a descriptive name for your inventory report configuration.
Prefix: Provide prefix if you want to limit the scope of inventory object.
Included objects: Specify the objects to be included in the inventory report. You can choose to include all objects or use a prefix to filter specific objects.

Destination bucket: Select the bucket where you want to store the inventory reports. You can choose an existing bucket or create a new one. It will create a example bucket policy for you, and you can customize as needed.

Schedule: Choose the frequency for generating the inventory report. You can select daily, weekly, or monthly.
Format: Select the format for the inventory report. You can choose CSV, ORC (Apache Hive), or Parquet.

Encryption: Enable encryption for the inventory report if required. You can choose to use server-side encryption with S3-managed keys or provide your own KMS key.
Optional fields: Customize the inventory report by selecting the desired optional fields such as size, storage class, and metadata.

Once you have provided all the necessary information, review the configuration settings. Click on the “Save” button to create the inventory report configuration. Inventory report generation can take up to 48 hours depending on the size of your bucket

Generating the manifest file

Once the inventory report is generated, we can proceed with the encryption process. Follow these steps:

Setting up Glue resources

Create a Glue Database

Go to the AWS Management Console and navigate to the AWS Glue service.
Click on “Databases” in the left navigation pane.
Select “Add database” and provide a name for your database, such as “InventoryDB.”
Click on “Create” to create the Glue database.

Create an IAM role for the Crawler

Open the IAM Management Console in a new tab.
Click on “Roles” in the left navigation pane.
Select “Create role” and choose the “AWS service” as the trusted entity.
In the service, search for and select “Glue.”
Under “Permissions,” select the required policies to grant the necessary permissions to access the S3 bucket where the inventory report is stored. This may include policies such as “AmazonS3ReadOnlyAccess.”
Provide a name for the IAM role, such as “GlueCrawlerRole,” and click on “Create role.”

Create a Glue Crawler

Return to the AWS Glue service console.
Click on “Crawlers” in the left navigation pane.
Select “Add crawler” to create a new crawler.
Provide a name for the crawler, such as “InventoryCrawler.”
Choose the data source as “S3” and specify the S3 bucket where the inventory report is stored.
Select the IAM role you created earlier (GlueCrawlerRole) for the crawler.
Configure other settings as needed and click on “Next.”
In the “Add a database” step, select the Glue database you created (InventoryDB).
Complete the remaining configuration steps and click on “Finish” to create the crawler.
“Run crawler” to start the crawling process.
Wait for the crawler to complete, as it analyzes the inventory report and creates a table in the Glue database based on the schema.

Run Athena queries and save results to S3

Navigate to the AWS Athena service console.
Select the Glue database you created (InventoryDB) as the data source.
Write and execute Athena queries to analyze the data and generate the desired manifest file.
Once the query result is generated, save it to an S3 bucket of your choice using the workgroup setting.

Below is the sample query to be run on Athena query editor:

select replace(col0, '"', '') as bucket, replace(col1, '"', '') as key
FROM "InventoryDB"."InventoryTable"

If you want to select objects based on timestamp of the object upload/modify, you can use below query to filter out data:

select replace(col0, '"', '') as bucket, replace(col1, '"', '') as key
FROM "InventoryDB"."Inventorytable"
WHERE from_iso8601_timestamp(replace(col3, '"', '')) >= TIMESTAMP '2023-05-25 00:00:00'
  AND from_iso8601_timestamp(replace(col3, '"', '')) < TIMESTAMP '2023-05-28 00:00:00';

Once Athena queries are completed, download the query result and remove the column headers as S3 Batch job does not require headers. You can use shell commands to remove first row from CSV file.

Now we have our manifest file ready, lets create a job!

Create S3 batch job operations using the manifest file

Navigate to the Amazon S3 service in the AWS Management Console.
Click on “Batch Operations” in the left-hand menu and select “Create job” to initiate the S3 batch job creation.
Provide the required information as below:

Manifest file type and location: Select your type of manifest file, it could be inventory report or CSV file along with the location where manifests are stored.

Type of operation and destination bucket and storage class: Provide operation as copy, location where objects will be copied with provided encryption key. It could be same as source bucket or another bucket based on your requirements.

In next steps keep everything to defaults. We have already configured the required KMS key for the bucket. Provide description for your job and priority for the job.

Completion report: You can select the type of report once job is completed. You can select either failed task only or all tasks. We are selecting first one as we don’t want any post job actions on successful ones. Provide path where completion report needs to be saved.

IAM role: Provide an IAM role for this job. It should have an access to perform required operations on all the buckets being considered here. This includes buckets where the inventory report is saved, where object needs to be copied, and where completion report needs to be saved. Finally, review all details and click next.

Once job creation is completed, please run the job and verify its progress on the console.

Conclusion

By following the outlined steps in this blog post, you have successfully encrypted millions of objects in your S3 buckets using AWS S3 Batch Operations and CMK. This process ensures the confidentiality and integrity of your data, providing a robust security layer. Keep exploring the vast capabilities of AWS to enhance the protection of your valuable assets.

About the Author

Deepali Sonune is a DevOps engineer with 10+ years of industry experience. She has been developing high-performance DevOps solutions with stringent security and governance requirements in AWS for 7+ years. She also works with developers and IT to oversee code releases, combining an understanding of both engineering and programming.

QloudX

Digital Transformation & Cloud Journey

INDUSTRIES/DOMAINS

OUR EXPERT AREAS

Cloud Migration

Devops

Data & Analytics

Cloud Native Development

Managed Services

OTHER THINGS WE DO

Efficiently Encrypt Millions of S3 Objects: A Step-by-Step Guide Using AWS S3 Batch Operations and CMK

Efficiently Encrypt Millions of S3 Objects: A Step-by-Step Guide Using AWS S3 Batch Operations and CMK

Table of Contents

Introduction

Prerequisites

Setting up inventory reports on S3 bucket