AWS analytics at re:Invent 2025: Unifying Information, AI, and governance at scale


re:Invent 2025 showcased the daring Amazon Internet Providers (AWS) imaginative and prescient for the way forward for analytics, one the place knowledge warehouses, knowledge lakes, and AI improvement converge right into a seamless, open, clever platform, with Apache Iceberg compatibility at its core. Throughout over 18 main bulletins spanning three weeks, AWS demonstrated how organizations can break down knowledge silos, speed up insights with AI, and preserve sturdy governance with out sacrificing agility.

Amazon SageMaker: Your knowledge platform, simplified

AWS launched a sooner, less complicated method to knowledge platform onboarding for Amazon SageMaker Unified Studio. The brand new one-click onboarding expertise eliminates weeks of setup, so groups can begin working with present datasets in minutes utilizing their present AWS Identification and Entry Administration (IAM) roles and permissions. Accessible instantly from Amazon SageMaker, Amazon Athena, Amazon Redshift, and Amazon S3 Tables consoles, this streamlined expertise routinely creates SageMaker Unified Studio tasks with present knowledge permissions intact. At its core is a robust new serverless pocket book that reimagines how knowledge professionals work. This single interface combines SQL queries, Python code, Apache Spark processing, and pure language prompts, backed by Amazon Athena for Apache Spark to scale from interactive exploration to petabyte-scale jobs. Information engineers, analysts, and knowledge scientists not have to context-switch between totally different instruments primarily based on workload—they’ll discover knowledge with SQL, construct fashions with Python, and use AI help, multi function place.

The introduction of Amazon SageMaker Information Agent within the new SageMaker notebooks marks a pivotal second in AI-assisted improvement for knowledge builders. This built-in agent doesn’t solely generate code, it understands your knowledge context, catalog data, and enterprise metadata to create clever execution plans from pure language descriptions. Whenever you describe an goal, the agent breaks down complicated analytics and machine studying (ML) duties into manageable steps, generates the required SQL and Python code, and maintains consciousness of your pocket book surroundings all through your entire course of. This functionality transforms hours of handbook coding into minutes of guided improvement, which suggests groups can deal with gleaning insights quite than repetitive boilerplate.

Embracing open knowledge with Apache Iceberg

One vital theme throughout this yr’s launches was the widespread adoption of Apache Iceberg throughout AWS analytics, reworking how organizations handle petabyte-scale knowledge lakes. Catalog federation to distant Iceberg catalogs by means of the AWS Glue Information Catalog addresses a vital problem in trendy knowledge architectures. Now you can question distant Iceberg tables, saved in Amazon Easy Storage Service (Amazon S3) and catalogued in distant Iceberg catalogs, utilizing most popular AWS analytics providers corresponding to Amazon Redshift, Amazon EMR, Amazon Athena, AWS Glue, and Amazon SageMaker, with out transferring or copying tables. Metadata synchronizes in actual time, offering question outcomes that mirror the present state. Catalog federation helps each coarse-grained entry management and fine-grained entry permissions by means of AWS Lake Formation enabling cross-account sharing and trusted id propagation whereas sustaining constant safety throughout federated catalogs.

Amazon Redshift now writes on to Apache Iceberg tables, enabling true open lakehouse architectures the place analytics seamlessly span knowledge warehouses and lakes. Apache Spark on Amazon EMR 7.12, AWS Glue, Amazon SageMaker notebooks, Amazon S3 Tables, and the AWS Glue Information Catalog now help Iceberg V3’s capabilities, together with deletion vectors that mark deleted rows with out costly file rewrites, dramatically lowering pipeline prices and accelerating knowledge modifications and row lineage. V3 routinely tracks each file’s historical past, creating audit trails important for compliance and has table-level encryption that helps organizations meet stringent privateness rules. These improvements imply sooner writes, decrease storage prices, complete audit trails, and environment friendly incremental processing throughout your knowledge structure.

Governance that scales along with your group

Information governance obtained substantial consideration at re:Invent with main enhancements to Amazon SageMaker Catalog. Organizations can now curate knowledge on the column degree with customized metadata types and wealthy textual content descriptions, listed in actual time for quick discoverability. New metadata enforcement guidelines require knowledge producers to categorise property with permitted enterprise vocabulary earlier than publication, offering consistency throughout the enterprise. The catalog makes use of Amazon Bedrock giant language fashions (LLMs) to routinely recommend related enterprise glossary phrases by analyzing desk metadata and schema data, bridging the hole between technical schemas and enterprise language. Maybe most significantly, SageMaker Catalog now exports its whole asset metadata as queryable Apache Iceberg tables by means of Amazon S3 Tables. This manner, groups can analyze catalog stock with commonplace SQL to reply questions like “which property lack enterprise descriptions?” or “what number of confidential datasets had been registered final month?” with out constructing customized ETL infrastructure.

As organizations undertake multi-warehouse architectures to scale and isolate workloads, the brand new Amazon Redshift federated permissions functionality eliminates governance complexity. Outline knowledge permissions one time from a Amazon Redshift warehouse, they usually routinely implement them throughout the warehouses in your account. Row-level, column-level, and masking controls apply constantly no matter which warehouse queries originate from, and new warehouses routinely inherit permission insurance policies. This horizontal scalability means organizations can add warehouses with out rising governance overhead, and analysts instantly see the databases from registered warehouses.

Accelerating AI innovation with Amazon OpenSearch Service

Amazon OpenSearch Service launched highly effective new capabilities to simplify and speed up AI software improvement. With help for OpenSearch 3.3, agentic search allows exact outcomes utilizing pure language inputs with out the necessity for complicated queries, making it simpler to construct clever AI brokers. The brand new Apache Calcite-powered PPL engine delivers question optimization and an intensive library of instructions for extra environment friendly knowledge processing.

As seen in Matt Garman’s keynote, constructing large-scale vector databases is now dramatically sooner with GPU acceleration and auto-optimization. Beforehand, creating large-scale vector indexes required days of constructing time and weeks of handbook tuning by consultants, which slowed innovation and prevented cost-performance optimizations. The brand new serverless auto-optimize jobs routinely consider index configurations—together with k-nearest neighbors (k-NN) algorithms, quantization, and engine settings—primarily based in your specified search latency and recall necessities. Mixed with GPU acceleration, you may construct optimized indexes as much as ten occasions sooner at 25% of the indexing price, with serverless GPUs that activate dynamically and invoice solely when offering pace boosts. These developments simplify scaling AI purposes corresponding to semantic search, advice engines, and agentic methods, so groups can innovate sooner by dramatically lowering the effort and time wanted to construct large-scale, optimized vector databases.

Efficiency and price optimization

Additionally introduced within the keynote, Amazon EMR Serverless now eliminates native storage provisioning for Apache Spark workloads, introducing serverless storage that reduces knowledge processing prices by as much as 20% whereas stopping job failures from disk capability constraints. The absolutely managed, auto scaling storage encrypts knowledge in transit and at relaxation with job-level isolation, permitting Spark to launch employees instantly when idle quite than maintaining them energetic to protect non permanent knowledge. Moreover, AWS Glue launched materialized views primarily based on Apache Iceberg, storing precomputed question outcomes that routinely refresh as supply knowledge adjustments. Spark engines throughout Amazon Athena, Amazon EMR, and AWS Glue intelligently rewrite queries to make use of these views, accelerating efficiency by as much as eight occasions whereas lowering compute prices. The service handles refresh schedules, change detection, incremental updates, and infrastructure administration routinely.

The brand new Apache Spark improve agent for Amazon EMR transforms model upgrades from months-long tasks into week-long initiatives. Utilizing conversational interfaces, engineers categorical improve necessities in pure language whereas the agent routinely identifies API adjustments and behavioral modifications throughout PySpark and Scala purposes. Engineers assessment and approve advised adjustments earlier than implementation, sustaining full management whereas the agent validates practical correctness by means of knowledge high quality checks. At the moment supporting upgrades from Spark 2.4 to three.5, this functionality is on the market by means of SageMaker Unified Studio, Kiro CLI, or an built-in improvement surroundings (IDE) with Mannequin Context Protocol compatibility.

For workflow optimization, AWS launched a brand new Serverless deployment choice for Amazon Managed Workflows for Apache Airflow (Amazon MWAA), which eliminates the operational overhead of managing Apache Airflow environments whereas optimizing prices by means of serverless scaling. This new providing addresses key challenges of operational scalability, price optimization, and entry administration that knowledge engineers and DevOps groups face when orchestrating workflows. With Amazon MWAA Serverless, knowledge engineers can deal with defining their workflow logic quite than monitoring for provisioned capability. They will now submit their Airflow workflows for execution on a schedule or on demand, paying just for the precise compute time used throughout every process’s execution.

Wanting ahead

These launches collectively symbolize greater than incremental enhancements. They sign a elementary shift in how organizations are approaching analytics. By unifying knowledge warehousing, knowledge lakes, and ML below a standard framework constructed on Apache Iceberg, simplifying entry by means of clever interfaces powered by AI, and sustaining sturdy governance that scales effortlessly, AWS is giving organizations the instruments to deal with insights quite than infrastructure. The emphasis on automation, from AI-assisted improvement to self-managing materialized views and serverless storage, reduces operational overhead whereas bettering efficiency and price effectivity. As knowledge volumes proceed to develop and AI turns into more and more central to enterprise operations, these capabilities place AWS clients to speed up their data-driven initiatives with unprecedented simplicity and energy. To view the Re:Invent 2025 Innovation Discuss on analytics, go to Harnessing analytics for people and AI on YouTube.


In regards to the authors

Larry Weber

Larry Weber

Larry leads product advertising and marketing for the analytics portfolio at AWS.