At present, Databricks’s assist of Iceberg v3 enters Public Preview, unlocking the newest improvements from the Iceberg neighborhood natively on the open lakehouse.
Iceberg v3 marks a significant step ahead for open desk codecs, unlocking use instances throughout incremental knowledge processing and semi-structured knowledge evaluation which beforehand required brittle workarounds. Past this, Iceberg v3 represents a major technological innovation by additional unifying the information layer of Iceberg and Delta Lake, eliminating the necessity to rewrite knowledge when constructing interoperable pipelines.
Right here’s what’s new in Iceberg v3, why it issues, and why Databricks is one of the best place to run your lakehouse.
What’s new in Iceberg v3?
Unity Catalog Managed Iceberg v3 tables assist Row Lineage, Deletion Vectors, and VARIANT, unlocking new use instances and important efficiency advantages. Databricks also can interoperate with these options on overseas Iceberg tables (Iceberg tables registered in different catalogs), enabling clients to construct brokers and AI functions towards their knowledge, no matter the place it lives.
Incremental Processing at Scale: Row Lineage and Deletion Vectors
Most knowledge arrives as a stream of modifications (INSERTs, UPDATEs, MERGEs, DELETEs) reasonably than in batches, sometimes sourced from operational databases, occasion streams, and third-party APIs. Traditionally, processing these modifications required fixing two onerous issues:
- Figuring out which rows modified in bronze datasets
- Making use of these modifications effectively to silver/gold datasets
Groups often resorted to full desk scans or exterior CDC methods to detect modifications, and costly file rewrites to use them. This resulted in pipelines that have been sluggish, pricey to keep up, and susceptible to drift and knowledge silos.
Now, row lineage permits groups to shortly establish which rows modified. Each row in an Iceberg v3 desk carries a everlasting row ID and a sequence quantity reflecting when the row was final modified.

Moreover, deletion vectors make making use of modifications to datasets extra performant than ever. Deletion vectors enable Iceberg to trace which rows have been logically deleted with out instantly rewriting the underlying knowledge recordsdata. As an alternative of bodily deleting rows by rewriting giant Parquet recordsdata, the engine writes a light-weight delete file alongside the information. The result’s knowledge manipulation efficiency that’s as much as 10x sooner than the standard copy-on-write method.

With Deletion Vectors now native to Iceberg, Geodis can construct its Iceberg Lakehouse on Databricks with out compromising efficiency or engine selection.
“Now that Deletion Vectors have come to Iceberg, we will centralize our Iceberg knowledge property in Unity Catalog, whereas leveraging the engine of our selection and sustaining best-in-class efficiency.” — Delio Amato, Chief Architect & Information Officer, Geodis
Collectively, row lineage and deletion vectors make CDC a local property of the desk itself. Groups can construct pipelines that concentrate on incrementally processing solely what truly modified, reducing prices and driving sooner time-to-insight for each analyst and knowledge scientist downstream.

Semi-Structured Information as a First-Class Citizen through VARIANT
Logs, API responses, clickstreams, and IoT payloads are very precious semi-structured knowledge sources. As they evolve, AI fashions can adapt alongside them, studying instantly from altering real-world alerts.
Nonetheless, traditionally, knowledge groups confronted a painful tradeoff when working with semi-structured knowledge. One customary method was to implement inflexible schemas, however this led to brittle pipelines that broke each time the upstream knowledge advanced. One other canonical workaround was to retailer the information as uncooked string dumps, however this made queries very complicated and sluggish. Neither method was scalable.
The Iceberg v3 VARIANT sort resolves this tradeoff. VARIANT is a local column sort that shops semi-structured payloads alongside relational columns in the identical Iceberg desk. This doesn’t require any flattening, storage in a separate system, or an ETL pipeline for normalization. Relatively, knowledge groups can ingest uncooked semi-structured knowledge as-is and question it with customary SQL.

Panther makes use of VARIANT to energy large-scale ingestion and analytics throughout semi-structured safety logs.
“Unity Catalog and Iceberg v3 unlock the facility of semi-structured knowledge by VARIANT. This permits interoperability and cost-effective, petabyte-scale log assortment.” — Russell Leighton, Chief Architect, Panther
With VARIANT, your AI fashions and analytics pipelines work instantly towards stay, evolving knowledge in a single ruled desk. When new fields seem in API responses or new occasion sorts enter clickstreams, they’re queryable instantly and not using a schema migration. With efficiency optimizations like shredding, clients can profit from columnar-like efficiency on their semi-structured knowledge, unlocking low-latency BI, dashboards, and alerting pipelines.
Unity Catalog delivers interoperability and efficiency for multi-engine, multi-catalog enterprises
Fashionable enterprises depend on a number of engines and catalogs to assist various use instances throughout enterprise models and legacy methods. Unity Catalog was designed to allow interoperability and governance throughout catalogs, whereas additionally optimizing knowledge layouts based mostly on question patterns.
Unified governance throughout catalogs and engines
Unity Catalog’s open APIs enable clients to write down as soon as and skim wherever – no extra knowledge duplication or siloed entry controls. UC can federate to different Iceberg catalogs, enabling bi-directional interoperability. All Iceberg knowledge in Snowflake, AWS Glue, Salesforce, and different main catalogs, may be learn by Unity Catalog, and all knowledge in UC may be accessed by those self same third get together platforms through open APIs.

Past this, Unity Catalog is the primary catalog to assist fine-grained entry management on exterior engines, empowering groups to outline row filters and column masks as soon as and have them enforced in every single place knowledge is accessed. Centralizing governance on Unity Catalog makes it considerably simpler for safety groups to control and monitor their lakehouse, whereas additionally giving knowledge groups autonomy to level any device at their lakehouse.

Delta and Iceberg interoperability
Delta Lake with UniForm unlocks interoperability throughout clients’ Delta Lake and Iceberg ecosystems: write as soon as to Delta Lake, and skim as Iceberg from Snowflake, BigQuery, Redshift, Athena, Trino, or every other Iceberg engine. With Iceberg v3 adopting Deletion Vectors, Row Lineage, and VARIANT natively, clients not face a tradeoff between Delta Lake efficiency options and Iceberg compatibility. The result’s a single copy of information that serves each engine in your stack, with no replication pipelines to keep up or threat of drift. A number one monetary providers supplier changed a pricey full-table replication service with UniForm, letting Snowflake learn instantly from Unity Catalog managed tables.
Automated efficiency and optimization
Past interoperability, Databricks brings efficiency, format optimization, and governance collectively in a single system so groups don’t should sew these capabilities collectively themselves. Databricks combines clever upkeep (Predictive Optimization), bodily format optimizations based mostly on question patterns (Computerized Liquid Clustering), and cross-engine governance (Unity Catalog) in a single layer, with no handbook configuration required.
Different managed Iceberg choices require groups to handle desk upkeep, file format, and entry coverage enforcement independently. On Databricks, these capabilities are unified and computerized, eradicating an entire class of operational overhead whereas additionally preserving full knowledge portability.
Get Began with Apache Iceberg v3 on Databricks
Iceberg v3 on Databricks is Public Preview at present! Groups can now make the most of one of the best options throughout Delta and Iceberg with out buying and selling off between efficiency and interoperability.
Iceberg v3 is obtainable on Databricks Runtime 18.0+ with Unity Catalog enabled.
Making a Unity Catalog managed Iceberg desk with v3 enabled is simple:
Making a Unity Catalog managed Delta desk with UniForm and v3 enabled is simply as easy:
Wanting forward: Iceberg v4
Iceberg v3 unifies the information layer throughout Delta and Iceberg on a performant, interoperable basis – the following frontier is the metadata layer. Databricks engineers are actively driving a number of core Iceberg v4 proposals within the Apache neighborhood to make metadata less complicated, sooner and extra scalable. These embody the adaptive metadata tree, which simplifies the metadata construction so that almost all operations require writing solely a single file as a substitute of a number of. Further proposals embody relative path assist for seamless desk relocation throughout environments and a modernized statistics mannequin that extends to newer knowledge sorts like VARIANT and GEOMETRY. Collectively, these developments will imply sooner ingestion, extra environment friendly question planning, and less complicated desk administration at enterprise scale. We’re excited to proceed advancing the Iceberg specification with the neighborhood.
Be taught extra at Information and AI Summit
Get began with Iceberg v3 and be part of us on the upcoming Information and AI Summit in San Francisco, June 15-18, 2026, to study extra about our Iceberg roadmap and work throughout the ecosystem.