Skip to main content
Log in

Lineage tracing for general data warehouse transformations

  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract.

Data warehousing systems integrate information from operational data sources into a central repository to enable analysis and mining of the integrated information. During the integration process, source data typically undergoes a series of transformations, which may vary from simple algebraic operations or aggregations to complex “data cleansing” procedures. In a warehousing environment, the data lineage problem is that of tracing warehouse data items back to the original source items from which they were derived. We formally define the lineage tracing problem in the presence of general data warehouse transformations, and we present algorithms for lineage tracing in this environment. Our tracing procedures take advantage of known structure or properties of transformations when present, but also work in the absence of such information. Our results can be used as the basis for a lineage tracing tool in a general warehousing setting, and also can guide the design of data warehouses that enable efficient lineage tracing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Author information

Authors and Affiliations

Authors

Additional information

Edited by T. Özsu. Received: September 26, 2001 / Accepted: August 15, 2002 Published online: January 14, 2003

This work was supported by the National Science Foundation under grants IIS-9811947 and IIS-9817799.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cui, Y., Widom, J. Lineage tracing for general data warehouse transformations. VLDB 12, 41–58 (2003). https://doi.org/10.1007/s00778-002-0083-8

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-002-0083-8

Navigation