physique.orbit:not(.legacy-content) .pageblock .elementor-widget-container:has(> desk) {
max-width: 100%;
overflow-x: unset;
border-radius: 0px;
box-shadow: none;
}
physique.orbit:not(.legacy-content) .pageblock desk tr:first-child td {
background-color: #8017e1 !necessary;
colour: #fff !necessary;
}
physique.orbit:not(.legacy-content) .pageblock desk tbody > tr:nth-child(2n+1) > td {
background-color: clear;
}
physique.orbit tbody tr:nth-child(odd) {
background-color: #f7f8fa;
}
physique.orbit h2 {
margin-top: 50px;
}
Key Takeaways
- Join any OpenLineage-compatible orchestrator to the Exactly Information Integrity Suite in minutes — no customized connector required.
- Dataset-level and column-level lineage are each captured routinely based mostly on the occasion payload.
- Lineage is all the time full: when a dataset hasn’t been formally found but, the catalog creates placeholders and routinely enriches them when discovery runs.
Information pipelines have by no means been extra advanced. Trendy knowledge groups run workloads throughout a rising mixture of orchestration instruments — Airflow, Spark, dbt, Dagster — and each new device historically meant a brand new customized connector simply to seize lineage.
The result’s fragmented visibility, brittle integrations, and lineage graphs that go stale the second a device model change. There’s a greater means, and at Exactly, we tackled this problem immediately.
Why Bespoke Lineage Connectors Maintain Information Groups Again
Conventional lineage seize requires a devoted connector for each orchestration device: one for Dagster, one for Airflow, one for dbt, one for Spark. Every connector evolves by itself schedule, breaks model upgrades, and multiplies upkeep burden with each new device added.
We solved this by constructing the Exactly Information Integrity Suite to talk a language that orchestrators already perceive: OpenLineage.
What Is OpenLineage and Why Does It Matter for Information Groups?
OpenLineage is an open normal for metadata and lineage assortment designed to instrument jobs as they run. When a pipeline job is executed, the orchestrator emits a structured occasion payload to any HTTP endpoint that helps the protocol.
As a result of the usual is tool-agnostic and community-maintained, it has achieved broad adoption throughout the fashionable knowledge stack. Fairly than sustaining proprietary connectors, groups get lineage protection that grows routinely because the ecosystem evolves.
Each main orchestration device both ships with built-in help or has a mature neighborhood integration:
| Instrument | OpenLineage Help |
| Dagster | Constructed-in by way of openlineage-dagster |
| Apache Airflow | Constructed-in by way of apache-airflow-providers-openlineage |
| dbt | Constructed-in by way of dbt-core OpenLineage integration |
| Apache Spark | OpenLineage Spark integration (automated column lineage) |
| Apache Flink | OpenLineage Flink integration |
| Trino / Starburst | OpenLineage Trino integration |
In case your workforce makes use of any of those instruments, you might be one configuration change away from automated lineage seize.

Connecting Your Orchestrator
How Do You Join an Orchestrator to the Exactly Information Integrity Suite?
Configure your orchestrator to ship occasions to the Exactly API Gateway:
Endpoint: POST /v2/catalog/lineage
Authentication: API key or bearer token out of your workspace credentials
| Area | Worth |
| US | https://api.cloud.exactly.com |
| EU | https://api.eu1.cloud.exactly.com |
| GB | https://api.gb1.cloud.exactly.com |
| AU | https://api.au1.cloud.exactly.com |
openlineage.yml instance:

No extra setup is required on the catalog aspect. Occasions seem as quickly as your subsequent pipeline run completes.
How Occasions Movement

The endpoint acknowledges every occasion instantly and processes it asynchronously — your orchestrator is rarely blocked ready for catalog writes.
What Ends Up within the Catalog
After a pipeline run completes, you get:
- Searchable, browsable Transformation Job property for each pipeline run
- Lineage edges connecting supply and goal datasets
- Full column-level lineage with transformation labels
- Placeholder property that improve to completely enriched property when discovery runs
The Catalog Idea Mapping
| OpenLineage Idea | Catalog Idea |
| Job (namespace + identify) | A Transformation Job asset, searchable and browsable |
| Run (distinctive run ID) | Tracked for audit |
| Dataset (namespace + identify) | An present catalog asset, or a placeholder |
| Enter → Output edge | A lineage relation |
| Aspects | Asset properties: schema, possession, knowledge high quality, docs |
What Occurs When a Dataset Hasn’t Been Found But?
Pipelines usually run earlier than formal knowledge supply discovery completes. Fairly than dropping lineage edges, the catalog creates placeholder property — absolutely navigable catalog entries with provenance metadata from the occasion. When discovery runs later, the placeholder is enriched with harvested metadata; no lineage edges want rebuilding.
This implies lineage is full from day one — even in environments the place knowledge sources are nonetheless being cataloged. Groups can belief the graph with out ready for full discovery protection.
Professional tip: Dataset/subject identifier matching is actual. A case distinction, a lacking port, or a website prefix mismatch causes the catalog to create a placeholder as a substitute of linking to an present asset. Confirm your OpenLineage producer’s namespace and identify format towards your catalog connection settings earlier than enabling manufacturing lineage seize.
Column-Stage Lineage
How Does Column-Stage Lineage Work?
Dataset-level lineage solutions which desk feeds into which desk. Column-level lineage solutions which column, remodeled how, produces which output column — enabling root-cause evaluation and change-impact evaluation.
Column-level lineage travels within the column Lineage aspect of a COMPLETE occasion. Instruments like Spark and dbt emit this routinely.


Transformation Job: Full Transformation Context
Every column lineage relation hyperlinks to a Transformation Job asset that captures:
| Property | What IT Tells You |
| Identify | The pipeline that produced this column mapping |
| Sort / Subtype | Transformation class (e.g., AGGREGATION / SUM, IDENTITY, TRANSFORMATION) |
| Column Masked | Whether or not the supply worth was masked or anonymized |
| Run ID | The particular run that generated this lineage |
| Namespace | The orchestrator atmosphere (e.g., dagster-prod) |
| Occasion Time | When the pipeline run accomplished |
| Producer | Which device emitted the occasion |
Clever Graph: No Duplicate Paths
When column-level lineage is absolutely resolvable for a supply–goal pair, the catalog shops column-level relations solely. Dataset-level lineage for these pairs is routinely inferred by rollup — so each views seem within the UI with out duplicate edges within the graph. For orchestrators that don’t emit columnLineage, the catalog falls again to dataset-level lineage.
Partial Occasion Resilience
Resolvable column mappings are captured instantly. Unresolvable ones (referencing not-yet-discovered columns) are retried after discovery. An incomplete column mapping by no means blocks the dataset-level lineage or knowledge high quality metadata for a similar occasion.
Reliability You Can Depend On
Protected replays: Re-sending the identical occasion has no impact. Lineage relations are usually not duplicated, Transformation Job property are usually not re-created, and metadata will not be overwritten.
This issues greater than it might sound. In follow, pipeline orchestrators retry on failure, CI/CD techniques replay jobs throughout deployment, and catastrophe restoration procedures re-run historic occasions. With out idempotent occasion dealing with, every of these situations dangers corrupting the lineage graph with duplicate edges or stale metadata. The Exactly Information Integrity Suite processes every occasion precisely as soon as no matter what number of instances it’s acquired.
Any device that emits normal OpenLineage RunEvent payloads to an HTTP endpoint will work.
Abstract
| Functionality | Element |
| ✓ Zero-connector integration | Any OpenLineage-compatible device connects with a URL and a token |
| ✓ Dataset lineage | Computerized lineage relations from each COMPLETE pipeline occasion |
| ✓ Column lineage | Discipline-level lineage with transformation sort, subtype, description, and masking context |
| ✓ Placeholder property | Lineage is full from day one, even earlier than discovery runs |
| ✓ Metadata enrichment | Schema, possession, knowledge supply, and documentation from OpenLineage sides |
| ✓ Protected retries | Duplicate or replayed occasions by no means corrupt catalog state |
| ✓ TransformationJob property | Full provenance path of what remodeled every column and when |
Information pipelines are solely as reliable because the lineage behind them. By constructing on an open normal that the fashionable knowledge stack already speaks, the Exactly Information Integrity Suite makes correct, constant, and contextual lineage automated — so your groups can transfer quick with out second-guessing the place their knowledge got here from.
_____________________________________________________________________
Steadily Requested Questions
Q. Does OpenLineage work with my present orchestrator?
A. In case your orchestrator is Airflow, Spark, dbt, Dagster, Flink, or Trino/Starburst, built-in or mature neighborhood help is on the market. Configuration is a single YAML change pointing to the Exactly API endpoint. In case your device will not be on this checklist, any device that emits normal OpenLineage RunEvent payloads over HTTP will even work with out modification.
Q. What occurs if a dataset hasn’t been found but?
A. The catalog creates a placeholder asset with provenance metadata from the occasion, protecting lineage edges intact. When discovery runs later, the placeholder is routinely enriched with full metadata. No lineage must be rebuilt.
Q. Is dataset-level lineage nonetheless obtainable when column-level lineage is captured?
A. Sure. When column-level lineage is resolvable, dataset-level lineage is routinely inferred by rollup so each views can be found within the catalog UI. There aren’t any duplicate edges within the graph.
Q. What occurs if an occasion is re-sent or replayed?
A. Nothing modifications within the catalog. Occasions are processed idempotently — re-sending the identical occasion doesn’t create duplicate lineage relations, re-create Transformation Job property, or overwrite present metadata.
The put up OpenLineage Integration: Bridging Open Requirements with the Exactly Information Integrity Suite appeared first on Exactly.