In recent times, we’ve witnessed a big shift in how enterprises handle and analyze their ever-growing knowledge lakes. On the forefront of this transformation is Apache Iceberg, an open desk format that’s quickly gaining traction amongst large-scale knowledge shoppers.
Nevertheless, as enterprises scale their knowledge lake implementations, managing these Iceberg tables at scale turns into difficult. Knowledge groups usually must handle desk schema evolution, its partitioning, and snapshots variations. Automation streamlines these operations, gives consistency, reduces human error, and helps knowledge groups deal with higher-value duties.
The AWS Glue Knowledge Catalog now helps Iceberg desk administration utilizing the AWS Glue API, AWS SDKs, and AWS CloudFormation. Beforehand, customers needed to create Iceberg tables within the Knowledge Catalog with out partitions utilizing CloudFormation or SDKs and later add partitions from Amazon Athena or different analytics engines. This prevents the desk lineage from being tracked in a single place and provides steps outdoors automation within the steady integration and supply (CI/CD) pipeline for desk upkeep operations. With the launch, AWS Glue prospects can now use their most popular automation or infrastructure as code (IaC) instruments to automate Iceberg desk creation with partitions and use the identical instruments to handle schema updates and kind order.
On this submit, we present the right way to create and replace Iceberg tables with partitions within the Knowledge Catalog utilizing the AWS SDK and CloudFormation.
Resolution overview
Within the following sections, we illustrate the AWS SDK for Python (Boto3) and AWS Command Line Interface (AWS CLI) utilization of Knowledge Catalog APIs—CreateTable() and UpdateTable()—for Amazon Easy Storage Service (Amazon S3) primarily based Iceberg tables with partitions. We additionally present the CloudFormation templates to create and replace an Iceberg desk with partitions.
Conditions
The Knowledge Catalog API adjustments are made accessible within the following variations of the AWS CLI and SDK for Python:
- AWS CLI model of two.27.58 or above
- SDK for Python model of 1.39.12 or above
AWS CLI utilization
Let’s create an Iceberg desk with one partition, utilizing CreateTable() within the AWS CLI:
The createicebergtable.json is as follows:
The previous AWS CLI command creates the metadata folder for the Iceberg desk in Amazon S3, as proven within the following screenshot.

You possibly can populate the desk with values as follows and confirm the desk schema utilizing the Athena console:
The next screenshot exhibits the outcomes.

After populating the desk with knowledge, you’ll be able to examine the S3 prefix of the desk, which is able to now have the knowledge folder.

The knowledge folders partitioned in line with our desk definition and Parquet knowledge information created from our INSERT command can be found beneath every partitioned prefix.

Subsequent, we replace the Iceberg desk by including a brand new partition, utilizing UpdateTable():
The updateicebergtable.json is as follows.
UpdateTable() modifies the desk schema by including a metadata JSON file to the underlying metadata folder of the desk in Amazon S3.

We insert values into the desk utilizing Athena as follows:
The next screenshot exhibits the outcomes.

Examine the corresponding adjustments to the knowledge folder within the Amazon S3 location of the desk.

This instance has illustrated the right way to create and replace Iceberg tables with partitions utilizing AWS CLI instructions.
SDK for Python utilization
The next Python scripts illustrate utilizing CreateTable() and UpdateTable() for an Iceberg desk with partitions:
CloudFormation utilization
Use the next CloudFormation templates for CreateTable() and UpdateTable(). After the CreateTable template is full, replace the identical stack with the UpdateTable template by creating a brand new changeset in your stack and executing it.
Clear up
To keep away from incurring prices on the Iceberg tables created utilizing the AWS CLI, delete the tables from the Knowledge Catalog.
Conclusion
On this submit, we illustrated the right way to use the AWS CLI to create and replace Iceberg tables with partitions within the Knowledge Catalog. We additionally offered the SDK for Python and CloudFormation pattern code and templates. We hope this helps you automate the creation and administration of your Iceberg tables with partitions in your CI/CD pipelines and manufacturing environments. Strive it out in your personal use case and share your suggestions within the feedback part.
In regards to the authors
Acknowledgements: A particular because of everybody who contributed to the event and launch of this function – Purvaja Narayanaswamy, Sachet Saurabh, Akhil Yendluri and Mohit Chandak.