This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
data_quality_management_system:data_cleansing [2023/03/01 16:46] peter |
data_quality_management_system:data_cleansing [2024/05/26 19:33] (current) peter |
||
---|---|---|---|
Line 1: | Line 1: | ||
===== Data cleansing ===== | ===== Data cleansing ===== | ||
- | |||
- | === Introduction === | ||
- | This factsheet describes knowledge about data cleansing in a nutshell. Data cleansing is highlighted from different angles in a structured way. | ||
=== Definition === | === Definition === | ||
- | Data cleansing is the process of detecting and correcting [[data_quality_management_system/data_issue|data issues]] to improve the quality of data to an acceptable level. | + | Data cleansing is the [[general_term/ |
===Notes=== | ===Notes=== | ||
- | What an acceptable data quality level is for an organization should be defined | + | An organization should define |
Dimensions of data quality that can be improved by data cleansing are: | Dimensions of data quality that can be improved by data cleansing are: | ||
Line 16: | Line 13: | ||
* [[data_quality_dimension/ | * [[data_quality_dimension/ | ||
* [[data_quality_dimension/ | * [[data_quality_dimension/ | ||
- | * [[data_quality_dimension/ | + | * [[data_quality_management_system: |
* [[data_quality_dimension/ | * [[data_quality_dimension/ | ||
- | * Currency of data values | + | * [[data_quality_dimension/ |
* [[data_quality_dimension/ | * [[data_quality_dimension/ | ||
* [[data_quality_dimension/ | * [[data_quality_dimension/ | ||
Line 34: | Line 31: | ||
=== Purpose === | === Purpose === | ||
- | To detect and correct [[data_quality_management_system/data_issue|data issues]] and inconsistencies. | + | To detect and correct [[data_quality_management_system: |
=== Life cycle === | === Life cycle === | ||
Line 45: | Line 42: | ||
=== Methods === | === Methods === | ||
- | The next methods of correcting [[data_quality_management_system/data_issue|data issues]] can be distinguished: | + | The following |
^ Method | ^ Method | ||
| Abbreviation expansion | | Abbreviation expansion | ||
Line 68: | Line 65: | ||
| Type conversion | | Type conversion | ||
| Edit rules | Edit Rules, a new class of data quality rules, are rules that tells how to fix errors, i.e. which attributes are wrong and what values they should take. | | | Edit rules | Edit Rules, a new class of data quality rules, are rules that tells how to fix errors, i.e. which attributes are wrong and what values they should take. | | ||
- | | Data lifecycle management | + | | Data lifecycle management |
Note 4: Data issue prevention is far superior to data issue detection and cleansing, as it is cheaper and more efficient to prevent issues than to try and find them and correct them later. | Note 4: Data issue prevention is far superior to data issue detection and cleansing, as it is cheaper and more efficient to prevent issues than to try and find them and correct them later. | ||
Line 96: | Line 93: | ||
| Cost-effectiveness of data cleansing | Data cleansing must lead to a positive business case, i.e. the benefits must be bigger than the costs. | | Cost-effectiveness of data cleansing | Data cleansing must lead to a positive business case, i.e. the benefits must be bigger than the costs. | ||
- | === Relationss | + | === Relations |
- | | + | |Data cleansing| is child of |[[general_term/ |
- | | + | |Data cleansing| is an element of a |[[data_quality_general: |
- | * Purpose of data cleansing | + | |Data cleansing| |
- | * A data cleansing | + | |Data cleansing|is the successor of|[[data_quality_management_system/ |
- | * [[data_quality_management_system/ | + | |Data cleansing|uses|data cleansing |
+ | |Data cleansing|wil be applied firstly | ||
+ | |Data cleansing|improves|[[data_quality_general/ | ||
+ | |Data cleansing|needs|[[data_quality_management_system/ | ||
{{: | {{: | ||
Line 164: | Line 165: | ||
What is data cleansing? Guide to data cleansing tools, services and strategy. (2020, August 13). Talend Real-Time Open Source Data Integration Software. https:// | What is data cleansing? Guide to data cleansing tools, services and strategy. (2020, August 13). Talend Real-Time Open Source Data Integration Software. https:// | ||
+ | |||
+ | {{tag> | ||
+ |