===== Data lineage ===== === Definition === Data Lineage is [[data_quality_management_system:metadata|metadata]] that identifies the sources of data and the transformations through which it has passed up to the point of applying. (DAMA-NL, 2020) ===Notes 1=== Other definitions: * Data Lineage is a description of the pathway from the data source to their current location and thealterations made to the data along that pathway. (Brackett 2011). * Data lineage is the description of data movements and transformations at various abstraction levels along data chains and of the relationships between data at these levels (Steenbeek, 2023). * Backward DL is data lineage that describes where data come from. * Forward DL is data lineage that describes where data are used. * Horizontal DL is what is generally seen as data lineage. * Vertical DL is data lineage that describes the relationship between the concept data model, logical data model and application data model. ===Notes 2=== Data lineage answers the 5 W’s of data: - Where does the data come from or where does it go? - Who uses it? - When was it created? - What information does it contain? What transformations are executed? - Why does it exist? === Synomym === Data chain === Purposes === * To be able to conduct an impact analysis when making changes to data structures, data flows, or data processing. * To identify opportunities for improvement in the existing data flow. * To be able to investigate root causes of data issues. * To be able to determine the reliability of data, based on its origin. * To identify personal information (GPDS). * To support migration of applications and to identify “dead-ends”. * To be compliant with standards that require DL. * To enable Data Lifecycle Management === Life cycle === ^ Phase ^ Activity ^ | Plan | * Define the scope \\ * Select a way to store the DL information, e.g., by an editor or in a DL tool \\ * Collect the relevant metadata \\ * Enter, change, or delete the metadata \\ * "Stitch the nodes" | | Do | * Use the DL for its purpose | | Check | * Evaluate the effectiveness of the DL | | Act | * Adapt the DL\\ * Maintain the DL | ===Characteristics and requirements=== ^ Characteristic ^ Requirement ^ | Completeness | DL is complete regarding the scope. | | Maintainability | DL can be maintained efficiently. | | Clarity| DL can be interpreted easily (zooming, filtering) | === Relations=== | Data lineage | is parent of | [[data_concept/backward_data_lineage|backward data lineage]] | | Data lineage | is parent of | [[data_concept/forward_data_lineage|forward data lineage]] | | Data lineage | is parent of | [[data_concept/horizontal_data_lineage|horizontal data lineage]] | | Data lineage | is parent of | [[data_concept/vertical data lineage|vertical data lineage]] | | Data lineage | is an element of a | [[data_quality_general/data_quality_management_system|data quality management system]] | | Data lineage | is part of the | business or technical [[data_quality_management_system/metadata|metadata]] | | Data lineage | includes a set of | data elements but especially [[data_quality_management_system/critical_data_element|critical data elements]] | | Data lineage | facilitates the root cause analysis of | [[data_quality_management_system:data_quality_issue|data issues]] | {{:data_management:data_quality:data_lineage.jpg?250|}} === Example(s) === Example 1: Horizonal data lineage {{ :data_management:data_integration_and_interoperability:horizontal_data_lineage.png?600&direct |}} Example 2: Horizontal data lineage {{ :data_management:data_integration_and_interoperability:horizontal_data_lineage_2.png?600&direct |}} Example 3: Horizontal and vertical data lineage {{ :data_management:data_integration_and_interoperability:horizontal_and_vertical_data_lineage.png?600&direct |}} === Story === Legislation requires the Valencia bank to report monthly to its regulator, the central bank. The regulator, however, also wants to know how these reports have been produced and where the data comes from. This is to assess the quality of the data. Because the reports are generated by complex data flows, the bank decides to apply data lineage to map these flows and make them visible. It soon turned out that fields with the same meaning had different names in the systems involved. Nevertheless, it was possible to link the fields and it became clear where the reported data came from. The bank can now satisfactorily inform the supervisor about the origin of the reported data. A data steward is made responsible for the maintenance of the data lineage in the tool, so that the metadata is kept up to date. Data lineage also proves to be useful when making changes to the systems. The impact of changes in the systems downstream can be understood more quickly. === Reference(s) === * Achieve data lineage in data vault 2.0. (2017, July 18). Scalefree Blog. https://blog.scalefree.com/2017/07/18/achieve-data-lineage-in-data-vault-2-0/ * Colibra. (n.d.). * The complete guide to Data Lineage. https://www.collibra.com/wp-content/uploads/Ebook-DataLineage-20200113.pdf. * DAMA (2017). DAMA-DMBOK. Data Management Body of Knowledge. 2nd Edition. Technics Publications Llc. August 2017. * DAMA Dictionary of Data Management. 2nd Edition 2011. Technics Publications, LLC, New Jersey. * DAMA-NL (2020). Data Concept System for Data Quality Dimensions (DCS). Research Paper. * Data lineage - Solidatus simplified data lineage solution. (n.d.). Solidatus - An Award-Winning Data Lineage Solution. https://www.solidatus.com/data-lineage/ * Data lineage 103: Legislative requirements. (2021, April 11). Data Crossroads. https://datacrossroads.nl/2019/03/17/data-lineage-103/ * Data lineage 104: Documenting data lineage. (2020, July 9). Data Crossroads. https://datacrossroads.nl/2019/03/20/data-lineage-104/ * Data lineage and metadata management: An innovative approach. (2019, December 18). DATAVERSITY. https://www.dataversity.net/data-lineage-and-metadata-management-an-innovative-approach/ * Data lineage. (2014, December 20). Wikipedia, the free encyclopedia. Retrieved May 21, 2021, from https://en.wikipedia.org/wiki/Data_lineage * New vision on data lineage/ flow in DAMA-DMBOK2. (2021, April 13). Data Crossroads. https://datacrossroads.nl/2017/09/10/new-vision-on-data-lineage-flow-in-dama-dm-bok-2/ * Steenbeek, Irina (2023). [[https://www.linkedin.com/pulse/data-lineage-needs-benefits-various-stakeholders-dr-irina-steenbeek|Data Lineage: the Needs of and Benefits to Various Stakeholders]] {{tag>All DUMS}}