Improve Your Lakehouse: Your How-To Information for Changing to Unity Catalog Managed Tables


The brand new SET MANAGED command gives a seamless mechanism to convert UC exterior tables to UC managed tables whereas minimizing downtime, dealing with concurrent writes, sustaining desk configurations, and, the place attainable, preserving desk historical past. This text shares greatest practices and gives a step-by-step information for utilizing this typically accessible (GA) command:

Why Convert to UC Managed Tables?

With Unity Catalog because the supply of reality, managed tables unlock distinctive capabilities that improve efficiency, governance, and ease of use—with out vendor lock-in. 

Key benefits embrace:

  • Automated optimizations that may increase question efficiency by 20x and lower storage prices by 50%+ (extra particulars right here).
  • Streamlined information administration with automated cleanup for dropped information to save lots of on prices, in addition to undrop help
  • Enhanced governance with information lineage, fine-grained entry controls, and safer desk entry with Unity Catalog supervision over all reads and writes
  • basis for future capabilities akin to automated row deletion (Auto-TTL) and row-level ingestion Zerobus ingest, in Personal Preview). 

Transformed tables help reads from any third-party consumer (see right here for extra particulars). 

How can the SET MANAGED Conversion Command Assist? 

The SET MANAGED command makes conversion from exterior to managed tables simpler

Function

Advantage of SET MANAGED command

Reduce Downtime

Hold the desk on-line and accessible for reads utilizing Databricks Runtime 16.1 or above, and decrease downtime to only a few minutes for writes (or, for reads on Databricks Runtime 15.4 or beneath). 

Protect Id

The desk’s identify, permissions, tags, and settings for all tables, and desk historical past (for Delta tables) are all retained.

Deal with Concurrency

The SET MANAGED command safely handles concurrent writes that will happen through the conversion.

Roll Again

One other command referred to as UNSET MANAGED permits roll again of a transformed desk again to UC exterior inside 14 days, as a security internet.

How Do I Convert from Exterior to Managed Tables? 

A Practitioner’s Step-By-Step Information for Conversion

The SET MANAGED command makes desk conversion easy. In a step-by-step information, we have outlined key ideas to make sure a easy transition from exterior to managed tables.  

Step 1: Choose Exterior Tables to Convert

Start by deciding on a few Unity Catalog exterior tables to transform to UC managed first, to familiarize your group with the method, conditions, and post-conversion steps.

For instance, you possibly can check out this command first on a few tables which are completely learn and written to by Databricks purchasers (see planning a staged journey). 

Step 2: Pre-Flight Guidelines

Verify that your ecosystem of desk readers and writers are prepared for change. For every chosen UC exterior desk and its related workloads, you’ll need to:

  1. Replace to make use of Title-Primarily based Entry: Verify your jobs, notebooks, and queries to make sure they entry the desk utilizing its three-part identify (catalog.schema.desk) fairly than utilizing path-based entry (e.g., SELECT * FROM delta.’s3://path/to/desk’). Databricks Labs has developed UCX tooling that may aid you discover path-based references by operating the next Databricks Labs UCX lint-local-code from an IDE terminal, to research your native machine’s listing code (.py or .sql recordsdata).
  2. Cancel all Upkeep Jobs: To forestall conflicts, guarantee no OPTIMIZE, ZORDER, or CLUSTER BY jobs are operating or scheduled to run on the desk through the conversion course of, in the event that they exist (can test utilizing DESCRIBE HISTORY). After the conversion, Predictive Optimization will routinely deal with optimization jobs.
  3. [Optional] Improve Databricks Runtime Variations: All Databricks clusters studying from or writing to the desk ought to ideally be on Databricks Runtime 15.4 LTS or greater to retain full desk historical past for Delta tables. Databricks Runtime 16.1 or greater can remove reader downtime solely. 

Step 3: Run the Conversion Command

Execute the conversion utilizing the next conversion command:

 Notice: For tables with UniForm enabled, use SET MANAGED TRUNCATE UNIFORM HISTORY.

Step 4: Confirm the Outcome

After the command completes, affirm that the conversion was profitable by checking the desk’s metadata.

Within the output of this command, the “Kind” property ought to now show as “MANAGED”. You may as well see this similar info within the ‘About this desk’ part of the Catalog Explorer.

Step 5: Publish-Conversion Housekeeping

After a profitable conversion, full these closing steps to make sure a easy transition:

  • Restart streaming learn or write jobs that use the desk if any have paused
  • Carry out useful testing by operating key queries to make sure all readers and writers are working as anticipated on the newly managed desk
  • Affirm that Predictive Optimization is now enabled for the desk to start benefiting from automated upkeep (you can even allow CLUSTER by AUTO, for automated liquid clustering, or test if it’s been enabled).

Planning a Staged Journey

A profitable conversion of all tables to UC managed is a journey – adopting a phased strategy and planning forward may help guarantee a easy transition:

  1. Convert Databricks-Solely Tables: Prioritize changing tables which are completely learn from and written to by Databricks purchasers. An experimental software, Entry Insights, can be utilized to assist establish tables with solely “Databricks readers and writers” vs. “Non-databricks readers” or “Non-databricks writers”.
  2. Convert Tables with Supported Exterior Instruments: Decide which tables are accessed by third-party instruments which additionally natively help reads from UC managed tables, and convert these subsequent. Third-party entry will proceed working after conversion.
  3. Handle Complicated Circumstances Final: For tables accessed with unsupported legacy instruments—plan to make use of options like Compatibility Mode for reads. The place third-party writes are required, re-create these tables and allow writes to those UC managed tables in Preview Preview. 

Extra Concerns

The next particulars concerning the conversion command could also be helpful to know prematurely:

  • Rollback Time Restrict: To make use of roll again security internet, UNSET MANAGED should be run on the UC managed desk inside 14 days of conversion – after that, the unique exterior information shall be completely deleted to save lots of on storage prices.
  • Time Journey Nuances: Upgrading purchasers to fifteen.4 LTS or greater will be useful. For clusters operating on Databricks Runtime 14.3 LTS or beneath or should you use the UNSET MANAGED command to roll again, you possibly can solely time journey to historic commits by model quantity after conversion, not by timestamp.
  • Minimized Downtime for Writers: The command is designed to reduce downtime – writers might expertise a short outage (estimated between 1 and 5 minutes) through the closing section when the desk’s location is switched to the brand new managed location.
  • Short-term Delta Sharing Interruption: Delta Sharing shall be briefly interrupted throughout conversion, however this may operate correctly once more as soon as the method is full.  

Professional-Tip: Scaling Up with Bulk Conversion

To transform lots of or hundreds of Unity Catalog exterior tables in bulk inside a given schema, you need to use the next easy SQL script. 

Notice: This script performs reside modifications. It’s extremely beneficial to check it totally in a improvement setting earlier than operating it in manufacturing.

 

Controlling Your Information’s Bodily Location

Unified Catalog (UC) managed tables reside in customer-managed storage and are accessible via open catalog APIs. If you need extra management over how your information is bodily saved, you possibly can outline a managed storage location on the catalog or schema degree –  any new managed tables created in that catalog or schema shall be routinely organized in that specified location.

For pre-existing exterior tables, you possibly can set a managed storage location, then use the SET MANAGED command to transform them to UC managed tables. Throughout conversion, the system respects the managed location you’ve outlined, providing you with management over the bodily format of your information in cloud storage. Please contact your account group to entry this characteristic in Personal Preview at the moment. 

Changing from Exterior to Managed Tables At present

In only a few brief months since Public Preview, lots of of shoppers have efficiently transformed hundreds of tables with SET MANAGED.

Every little thing described right here is now GA—attempt it out at the moment and unlock the efficiency, governance, and ease of Unity Catalog Managed Tables.