AWS re:Invent 2024, the flagship annual convention, happened December 2–6, 2024, in Las Vegas, bringing collectively 1000’s of cloud lovers, innovators, and business leaders from across the globe. This premier occasion showcased groundbreaking developments, keynotes from AWS management, hands-on technical periods, and thrilling product launches.
Analytics remained one of many key focus areas this 12 months, with important updates and improvements geared toward serving to companies harness their knowledge extra effectively and speed up insights. From enhancing knowledge lakes to empowering AI-driven analytics, AWS unveiled new instruments and providers which are set to form the way forward for knowledge and analytics.
On this publish, we stroll you thru the highest analytics bulletins from re:Invent 2024 and discover how these improvements may also help you unlock the total potential of your knowledge.
Amazon SageMaker
Introducing the following technology of Amazon SageMaker
AWS pronounces the following technology of Amazon SageMaker, a unified platform for knowledge, analytics, and AI. This launch brings collectively broadly adopted AWS machine studying (ML) and analytics capabilities and gives an built-in expertise for analytics and AI with unified entry to knowledge and built-in governance.
The subsequent technology of SageMaker additionally introduces new capabilities, together with Amazon SageMaker Unified Studio (preview), Amazon SageMaker Lakehouse, and Amazon SageMaker Information and AI Governance. Amazon SageMaker Unified Studio brings collectively performance and instruments from the vary of standalone studios, question editors, and visible instruments out there at this time in Amazon EMR, AWS Glue, Amazon Redshift, Amazon Bedrock, and the prevailing Amazon SageMaker Studio. Amazon SageMaker Lakehouse gives an open knowledge structure that reduces knowledge silos and unifies knowledge throughout Amazon Easy Storage Service (Amazon S3) knowledge lakes, Redshift knowledge warehouses, and third-party and federated knowledge sources. Amazon SageMaker Information and AI Governance, together with Amazon SageMaker Catalog constructed on Amazon DataZone, empowers you to securely uncover, govern, and collaborate on knowledge and AI workflows.
Amazon DynamoDB zero-ETL integration with Amazon SageMaker Lakehouse
Amazon DynamoDB zero-ETL integration with SageMaker Lakehouse automates the extraction and loading of knowledge from a DynamoDB desk into SageMaker Lakehouse, an open and safe lakehouse. Utilizing the no-code interface, you’ll be able to preserve an up-to-date reproduction of your DynamoDB knowledge within the knowledge lake by rapidly establishing your integration to deal with the whole technique of replicating knowledge and updating data. This zero-ETL integration reduces the complexity and operational burden of knowledge replication to allow you to deal with deriving insights out of your knowledge. You possibly can create and handle integrations utilizing the AWS Administration Console, the AWS Command Line Interface (AWS CLI), or the SageMaker Lakehouse APIs.
Amazon S3 Tables
Amazon S3 Tables – Absolutely managed Apache Iceberg tables optimized for analytics workloads
Amazon S3 Tables ship the primary cloud object retailer with built-in Apache Iceberg assist, and essentially the most simple option to retailer tabular knowledge at scale. S3 Tables are particularly optimized for analytics workloads, leading to as much as 3 occasions quicker question throughput and as much as 10 occasions greater transactions per second in comparison with self-managed tables. S3 Tables are designed to carry out continuous desk upkeep to routinely optimize question effectivity and storage value over time, whilst your knowledge lake scales and evolves. S3 Tables integration with the AWS Glue Information Catalog is in preview, permitting you to stream, question, and visualize knowledge—together with Amazon S3 Metadata tables—utilizing AWS analytics providers reminiscent of Amazon Information Firehose, Amazon Athena, Amazon Redshift, Amazon EMR, and Amazon QuickSight.
Amazon S3 Metadata (Preview) – Best and quickest option to handle your metadata
Amazon S3 Metadata is the best and quickest method that can assist you immediately uncover and perceive your S3 knowledge with automated, queried metadata that updates in close to actual time. S3 Metadata helps object metadata, which incorporates system-defined particulars like dimension and the supply of the article, and {custom} metadata, which lets you use tags to annotate your objects with info like product SKU, transaction ID, or content material ranking, for instance.
S3 Metadata is designed to routinely seize metadata from objects as they’re uploaded right into a bucket, and to make that metadata queryable in a read-only desk. These metadata tables are saved in S3 Tables, the brand new S3 storage providing optimized for tabular knowledge. Moreover, S3 Metadata integrates with Amazon Bedrock, permitting for the annotation of AI-generated movies with metadata that specifies its AI origin, creation timestamp, and the precise mannequin used for its technology.
AWS Glue
With AWS Glue 5.0, you get improved efficiency, enhanced safety, assist for SageMaker Unified Studio and SageMaker Lakehouse, and extra. AWS Glue 5.0 allows you to develop, run, and scale your knowledge integration workloads and get insights quicker.
AWS Glue 5.0 upgrades the engines to Apache Spark 3.5.2, Python 3.11, and Java 17, with new efficiency and safety enhancements. It additionally updates open desk format assist to Apache Hudi 0.15.0, Apache Iceberg 1.6.1, and Delta Lake 3.2.0. AWS Glue 5.0 provides Spark native fine-grained entry management with AWS Lake Formation so you’ll be able to apply table-, column-, row-, and cell-level permissions on S3 knowledge lakes. Lastly, AWS Glue 5.0 provides assist for SageMaker Lakehouse to unify all of your knowledge throughout S3 knowledge lakes and Redshift knowledge warehouses.
Amazon S3 Entry Grants now combine with AWS Glue
Amazon S3 Entry Grants now combine with AWS Glue for analytics, ML, and software growth workloads in AWS. S3 Entry Grants map identities out of your id supplier (IdP), reminiscent of Entra ID and Okta or AWS Identification and Entry Administration (IAM) principals, to datasets saved in Amazon S3. This integration offers you the flexibility to handle Amazon S3 permissions for end-users operating jobs with AWS Glue 5.0 or later, with out the necessity to write and preserve bucket insurance policies or particular person IAM roles. When end-users within the acceptable person teams entry Amazon S3 utilizing AWS Glue ETL for Apache Spark, they’ll then routinely have the required permissions to learn and write knowledge.
AWS Glue Information catalog now automates producing statistics for brand new tables
The AWS Glue Information Catalog now automates producing statistics for brand new tables. These statistics are built-in with a cost-based optimizer (CBO) from Amazon Redshift and Athena, leading to improved question efficiency and potential value financial savings. Beforehand, creating statistics for Iceberg tables within the Information Catalog required you to constantly monitor and replace configurations in your tables. Now, the Information Catalog permits you to generate statistics routinely for brand new tables with one-time catalog configuration. Amazon Redshift and Athena use the up to date statistics to optimize queries, utilizing optimizations reminiscent of optimum be part of order or cost-based aggregation pushdown. The Information Catalog console gives you visibility into the up to date statistics and statistics technology runs.
AWS expands knowledge connectivity for Amazon SageMaker Lakehouse and AWS Glue
SageMaker Lakehouse pronounces unified knowledge connectivity capabilities to streamline the creation, administration, and utilization of connections to knowledge sources throughout databases, knowledge lakes, and enterprise purposes. SageMaker Lakehouse unified knowledge connectivity gives a connection configuration template, assist for normal authentication strategies like primary authentication and OAuth 2.0, connection testing, metadata retrieval, and knowledge preview. You possibly can create SageMaker Lakehouse connections by SageMaker Unified Studio (preview), the AWS Glue console, or a custom-built software utilizing APIs underneath AWS Glue.
With the flexibility to browse metadata, you’ll be able to perceive the construction and schema of the info supply and determine related tables and fields. SageMaker Lakehouse unified connectivity is obtainable the place SageMaker Lakehouse or AWS Glue is obtainable.
Saying generative AI troubleshooting for Apache Spark in AWS Glue (Preview)
AWS Glue pronounces generative AI troubleshooting for Apache Spark, a brand new functionality that helps knowledge engineers and scientists rapidly determine and resolve points of their Spark jobs. Spark Troubleshooting makes use of ML and generative AI applied sciences to supply automated root trigger evaluation for Spark job points, together with actionable suggestions to repair recognized points. With Spark troubleshooting, you’ll be able to provoke automated evaluation of failed jobs with a single click on on the AWS Glue console. Powered by Amazon Bedrock, Spark troubleshooting reduces debugging time from days to minutes.
The generative AI troubleshooting for Apache Spark preview is obtainable for jobs operating on AWS Glue 4.0.
Amazon EMR
Introducing Superior Scaling in Amazon EMR Managed Scaling
We’re excited to announce Superior Scaling, a brand new functionality in Amazon EMR Managed Scaling that gives you elevated flexibility to manage the efficiency and useful resource utilization of your Amazon EMR on EC2 clusters. With Superior Scaling, you’ll be able to configure the specified useful resource utilization or efficiency ranges in your cluster, and Amazon EMR Managed Scaling will use your intent to intelligently scale the cluster and optimize cluster compute assets.
Superior Scaling is obtainable with Amazon EMR launch 7.0 and later and is obtainable in all AWS Areas the place Amazon EMR Managed Scaling is obtainable.
Amazon Athena
Amazon SageMaker Lakehouse built-in entry controls now out there in Amazon Athena federated queries
SageMaker now helps connectivity, discovery, querying, and implementing fine-grained knowledge entry controls on federated sources when querying knowledge with Athena. Athena is a question service that makes it easy to research your knowledge lake and federated knowledge sources reminiscent of Amazon Redshift, DynamoDB, or Snowflake utilizing SQL with out extract, rework, and cargo (ETL) scripts. Now, knowledge employees can connect with and unify these knowledge sources inside SageMaker Lakehouse. Federated supply metadata is unified in SageMaker Lakehouse, the place you apply fine-grained insurance policies in a single place, serving to to streamline analytics workflows and safe your knowledge.
Amazon Managed Service for Apache Flink
AWS introduced assist for a brand new Apache Flink connector for Amazon Managed Service for Prometheus. The brand new connector, contributed by AWS for the Flink open supply undertaking, provides Amazon Managed Service for Prometheus as a brand new vacation spot for Flink. You should utilize the brand new connector to ship processed knowledge to an Amazon Managed Service for Prometheus vacation spot beginning with Flink model 1.19. With Amazon Managed Service for Apache Flink, you’ll be able to rework and analyze knowledge in actual time. There are not any servers and clusters to handle, and there’s no compute and storage infrastructure to arrange.
Amazon Managed Service for Apache Flink now delivers to Amazon SQS queues
AWS introduced assist for a brand new Flink connector for Amazon Easy Queue Service (Amazon SQS). The brand new connector, contributed by AWS for the Flink open supply undertaking, provides Amazon SQS as a brand new vacation spot for Apache Flink. You should utilize the brand new connector to ship processed knowledge from Amazon Managed Service for Apache Flink to SQS messages with Flink, a well-liked framework and engine for processing and analyzing streaming knowledge.
Amazon Managed Service for Apache Flink now gives a brand new Flink connector for Amazon Kinesis Information Streams. This open supply connector, contributed by AWS, helps Flink 2.0 and gives a number of enhancements. It permits in-order reads throughout stream scale-up or scale-down, helps Flink’s native watermarking, and improves observability by unified connector metrics. Moreover, the connector makes use of the AWS SDK for Java 2.x, which helps enhanced efficiency and safety features, and native retry technique. You should utilize the brand new connector to learn knowledge from a Kinesis knowledge stream beginning with Flink model 1.19.
Amazon Redshift
Amazon SageMaker Lakehouse and Amazon Redshift assist for zero-ETL integrations from eight purposes
SageMaker Lakehouse and Amazon Redshift now assist zero-ETL integrations from purposes, automating the extraction and loading of knowledge from eight purposes, together with Salesforce, SAP, ServiceNow, and Zendesk. As an open, unified, and safe lakehouse in your analytics and AI initiatives, SageMaker Lakehouse enhances these integrations to streamline your knowledge administration processes. These zero-ETL integrations are totally managed by AWS and decrease the necessity to construct ETL knowledge pipelines. Optimize your knowledge ingestion processes and focus as an alternative on evaluation and gaining insights.
Amazon Redshift multi-data warehouse writes by knowledge sharing is now usually out there
AWS pronounces the overall availability of Amazon Redshift multi-data warehouse writes by knowledge sharing. Now you can begin writing to Redshift databases from a number of Redshift knowledge warehouses in just some clicks. With Redshift multi-data warehouse writes by knowledge sharing, you’ll be able to hold ETL jobs extra predictable by splitting workloads between a number of warehouses, serving to you meet your workload efficiency necessities with much less effort and time. Your knowledge is straight away out there throughout AWS accounts and Areas after it’s dedicated, enabling higher collaboration throughout your group.
Saying Amazon Redshift Serverless with AI-driven scaling and optimization
Amazon Redshift Serverless introduces the following technology of AI-driven scaling and optimization in cloud knowledge warehousing. Redshift Serverless makes use of AI strategies to routinely scale with workload adjustments throughout all key dimensions—reminiscent of knowledge quantity adjustments, variety of concurrent customers, and question complexity—to satisfy and preserve your price-performance targets. Amazon inner exams show that this optimization can present you as much as 10 occasions higher value efficiency for variable workloads, with out guide intervention.
Redshift Serverless with AI-driven scaling and optimization is obtainable in all AWS Areas the place Redshift Serverless is obtainable.
Amazon Redshift now helps incremental refresh on Materialized Views (MVs) for knowledge lake tables
Amazon Redshift now helps incremental refresh of materialized views on knowledge lake tables. This functionality helps you enhance question efficiency in your knowledge lake queries in an economical and environment friendly method. By enabling incremental refresh for materialized views, you’ll be able to preserve up-to-date knowledge in a extra environment friendly and inexpensive method.
Assist for incremental refresh for materialized views on knowledge lake tables is now out there in all industrial Areas. To get began and be taught extra, go to Materialized views on exterior knowledge lake tables in Amazon Redshift Spectrum.
AWS pronounces Amazon Redshift integration with Amazon Bedrock for generative AI
AWS pronounces the mixing of Amazon Redshift with Amazon Bedrock, a totally managed service providing high-performing basis fashions (FMs) making it less complicated and quicker so that you can construct generative AI purposes. This integration allows you to use giant language fashions (LLMs) from easy SQL instructions alongside your knowledge in Amazon Redshift.
The Amazon Redshift integration with Amazon Bedrock is now usually out there in all Areas the place Amazon Bedrock and Amazon Redshift ML are supported. To get began, see Amazon Redshift ML integration with Amazon Bedrock.
Saying normal availability of auto-copy for Amazon Redshift
Amazon Redshift pronounces the overall availability of auto-copy, which simplifies knowledge ingestion from Amazon S3 into Amazon Redshift. This new characteristic allows you to arrange steady file ingestion out of your S3 prefix and routinely load new information to tables in your Redshift knowledge warehouse with out the necessity for added instruments or {custom} options.
Amazon Redshift auto-copy from Amazon S3 is now usually out there for each Redshift Serverless and Amazon Redshift RA3 Provisioned knowledge warehouses in all AWS industrial Areas.
Amazon DataZone
AWS pronounces normal availability of Information Lineage in Amazon DataZone and the following technology of SageMaker, a functionality that routinely captures lineage from AWS Glue and Amazon Redshift to visualise lineage occasions from supply to consumption. Being OpenLineage appropriate, this characteristic permits knowledge producers to enhance the automated lineage with lineage occasions captured from OpenLineage-enabled methods or by an API, to supply a complete knowledge motion view to knowledge customers. This characteristic automates lineage seize of schema and transformations of knowledge belongings and columns from AWS Glue, Amazon Redshift, and Spark executions in instruments to keep up consistency and cut back errors. Moreover, the info lineage characteristic variations lineage with every occasion, enabling you to visualise lineage at any time limit or evaluate transformations throughout an asset’s or job’s historical past.
Amazon DataZone now enhances knowledge entry governance with enforced metadata guidelines
Amazon DataZone now helps enforced metadata guidelines for knowledge entry workflows, offering organizations with enhanced capabilities to strengthen governance and compliance with their group wants. This new characteristic permits area house owners to outline and implement necessary metadata necessities, ensuring knowledge customers present important info when requesting entry to knowledge belongings in Amazon DataZone. By streamlining metadata governance, this functionality helps organizations meet compliance requirements, preserve audit readiness, and simplify entry workflows for higher effectivity and management.
Amazon DataZone expands knowledge entry with instruments like Tableau, Energy BI, and extra
Amazon DataZone now helps authentication with the Athena JDBC driver, enabling knowledge customers to question their undertaking’s subscribed knowledge lake belongings in Amazon DataZone utilizing well-liked enterprise intelligence (BI) and analytics instruments reminiscent of Tableau, Domino, Energy BI, Microsoft Excel, SQL Workbench, and extra. Information analysts and scientists can seamlessly entry and analyze ruled knowledge in Amazon DataZone utilizing a regular JDBC reference to their most popular instruments.
This characteristic is now out there in all of the AWS industrial Areas the place Amazon DataZone is supported. Take a look at Increasing knowledge evaluation and visualization choices: Amazon DataZone now integrates with Tableau, Energy BI, and extra and Connecting Amazon DataZone with exterior purposes through JDBC connectivity to be taught extra about the best way to join Amazon DataZone to exterior analytics instruments through JDBC.
Amazon QuickSight
Saying eventualities evaluation functionality of Amazon Q in QuickSight (preview)
A brand new state of affairs evaluation functionality of Amazon Q in QuickSight is now out there in preview. This new functionality gives an AI-assisted knowledge evaluation expertise that helps you make higher choices, quicker. Amazon Q in QuickSight simplifies in-depth evaluation with step-by-step steering, saving hours of guide knowledge manipulation and unlocking data-driven decision-making throughout your group. You possibly can ask a query or state your purpose in pure language and Amazon Q in QuickSight guides you thru each step of superior knowledge evaluation—suggesting analytical approaches, routinely analyzing knowledge, surfacing related insights, and summarizing findings with steered actions.
Amazon QuickSight now helps prompted studies and reader scheduling for pixel-perfect studies
We’re enabling QuickSight readers to generate filtered views of pixel-perfect studies and create schedules to ship studies by electronic mail. Readers can create as much as 5 schedules per dashboard for themselves. Beforehand, solely dashboard house owners might create schedules and solely on the default (creator revealed) view of the dashboard. Now, if an creator has added controls to the pixel-perfect report, schedules could be created or up to date to respect picks on the filter management.
Prompted studies and reader scheduling are actually out there in all supported QuickSight Areas—see Amazon QuickSight endpoints and quotas for QuickSight Regional endpoints.
Amazon Q in QuickSight unifies insights from structured and unstructured knowledge
Amazon Q in QuickSight gives you with unified insights from structured and unstructured knowledge sources by integration with Amazon Q Enterprise. With knowledge tales in Amazon Q in QuickSight, you’ll be able to add paperwork, or connect with unstructured knowledge sources from Amazon Q Enterprise, to create richer narratives or displays explaining your knowledge with extra context. This integration permits organizations to harness insights from all their knowledge with out the necessity for guide collation, resulting in extra knowledgeable decision-making, time financial savings, and a major aggressive edge.
Amazon Q Enterprise now gives insights out of your databases and knowledge warehouses (preview)
AWS pronounces the general public preview of the mixing between Amazon Q Enterprise and QuickSight, delivering a transformative functionality that unifies solutions from structured knowledge sources (databases, warehouses) and unstructured knowledge (paperwork, wikis, emails) in a single software.
With the QuickSight integration, now you can hyperlink your structured sources to Amazon Q Enterprise by the in depth set of knowledge supply connectors out there in QuickSight. This integration unifies insights throughout data sources, serving to organizations make extra knowledgeable choices whereas decreasing the time and complexity historically required to assemble insights.
Amazon OpenSearch Service
Amazon OpenSearch Service zero-ETL integration with Amazon Safety Lake
Amazon OpenSearch Service now gives a zero-ETL integration with Amazon Safety Lake, enabling you to question and analyze safety knowledge in-place instantly by OpenSearch. This integration permits you to effectively discover voluminous knowledge sources that have been beforehand cost-prohibitive to research, serving to you streamline safety investigations and procure complete visibility of your safety panorama.
Amazon OpenSearch Ingestion now helps writing safety knowledge to Amazon Safety Lake
Amazon OpenSearch Ingestion now permits you to write knowledge into Amazon Safety Lake in actual time, permitting you to ingest safety knowledge from each AWS and {custom} sources and uncover beneficial insights into potential safety points in close to actual time. With this characteristic, now you can use OpenSearch Ingestion to ingest and rework safety knowledge from well-liked third-party sources like Palo Alto, CrowdStrike, and SentinelOne into OCSF format earlier than writing the info into Amazon Safety Lake. After the info is written to Amazon Safety Lake, it’s out there within the AWS Glue Information Catalog and Lake Formation tables for the respective supply.
AWS Clear Rooms
AWS Clear Rooms now helps a number of clouds and knowledge sources
AWS Clear Rooms pronounces assist for collaboration with datasets from a number of clouds and knowledge sources. This launch permits corporations and their companions to collaborate with knowledge saved in Snowflake and Athena, with out having to maneuver or share their underlying knowledge amongst collaborators.
Conclusion
re:Invent 2024 showcased how AWS continues to push the boundaries of knowledge and analytics, delivering instruments and providers that empower organizations to derive quicker, smarter, and extra actionable insights. From developments in knowledge lakes, knowledge warehouses, and streaming options to the mixing of generative AI capabilities, these bulletins are designed to remodel the way in which companies work together with their knowledge.
As we glance forward, it’s clear that AWS is dedicated to serving to organizations keep forward in an more and more data-driven world. Whether or not you’re modernizing your analytics stack or exploring new potentialities with AI and ML, the improvements from re:Invent 2024 present the constructing blocks to unlock worth out of your knowledge.
Keep tuned for extra deep dives into these bulletins, and don’t hesitate to discover how these instruments can speed up your journey towards data-driven success!
In regards to the Authors
Sakti Mishra serves as Principal Information and AI Options Architect at AWS, the place he helps prospects modernize their knowledge structure and outline end-to end-data methods, together with knowledge safety, accessibility, governance, and extra. He’s additionally the creator of Simplify Large Information Analytics with Amazon EMR and AWS Licensed Information Engineer Examine Information books. Exterior of labor, Sakti enjoys studying new applied sciences, watching motion pictures, and visiting locations with household. He could be reached through LinkedIn.
Navnit Shukla serves as an AWS Specialist Options Architect with a deal with analytics. He possesses a robust enthusiasm for aiding purchasers in discovering beneficial insights from their knowledge. By way of his experience, he constructs revolutionary options that empower companies to reach at knowledgeable, data-driven selections. Notably, Navnit Shukla is the achieved creator of the guide titled “Information Wrangling on AWS.” He could be reached through LinkedIn.