===== Data quality rule ===== === Definition === A data quality rule is a [[general_term/rule|rule]] used to validate [[data_concept:data value|data values]] or [[data_concept:data record|data records]]. === Synonyms === * Data validation rule * Data quality rule specification === Purpose=== The purpose of a data quality rule is to prevent, monitor or report insufficient quality of [[data_concept:data value|data values]] and [[data_concept:data record|data records]]. === Life cycle === ^ Phase ^ Activity ^ | Plan | * Derive data quality rules from business rules\\ * Collect/identify data quality rules\\ * Specify data quality rules\\ * Document data quality rules as metadata\\ * Establish data quality rules | | Do | * Implement/apply data quality rules in a database or application. | | Check | * Test data quality rules\\ * Evaluate data quality rules | | Act | * Adapt/maintain data quality rules | === Characteristics and requirements === ^ Characteristic ^ Requirement ^ | Accessibility | Data quality rules are accessible to [[role:data_user|data users]]. It ensures that [[role:data_user|data users]] can ask questions about it and provide feedback on rules. | | Unambiguity | Data quality rules are unambiguous. It ensures that they can be implemented in a database or application. | |Maintainability|Data quality can efficiently being maintained.| === Relations === | Data quality rule | is child of | [[data_quality_management_system:metadata|metadata]] | | Data quality rule | is child of | [[general_term/rule|rule]] | | Data quality rule | is element of a | [[data_quality_general/data_quality_management_system|data quality management system]] | | Data quality rule | is associated with specific | [[data_quality_general:data_quality_dimension|data quality dimensions]] | | Data quality rule | can be applied to | [[data_concept:data_value|data values]] | | Data quality rule | can be applied to | [[data_quality_dimension/accuracy|accuracy]] | | Data quality rule | can be applied to | [[data_quality_dimension/completeness|completeness]] | | Data quality rule | can be applied to | [[data_quality_dimension/consistency|consistency]] | | Data quality rule | can be applied to | [[data_concept:data_record|data records]] | | Data quality rule | can be applied to | [[data_quality_dimension/uniqueness|uniqueness]] | | Data quality rule | can be integrated in a | database | | Data quality rule | can be integreted in an | application | | Data quality rule | is derived from a | [[general_term/business_rule|business rule]] | | Data quality rule | prevents | [[data_quality_management_system:data_quality_issue|data issues]] | | Data quality rule | is applied firstly to | [[data_quality_management_system:critical_data_element|critical data elements]] | | Data quality rule | is needed for | [[data_quality_management_system/data_quality_monitoring|data quality monitoring]] | | Data quality rule | is needed for | [[data_quality_management_system/data_cleansing|data cleansing]] | {{:data_management:data_quality:data_quality_rules.jpg?600|}} === Classification A === In table 1 Data quality rules are classified in three categories, and their subcategories. Table 1: Category, subcategories, and examples. ^ Category and subcategory ^ Example ^ ^ **Simple data element content rules.** \\ These are considered “simple” because you only need to inspect the contents of a single data element and check to see if the content meets the rules. || | Valid values, range, data type, pattern, and domain. | | | Optional versus mandatory (evaluates completeness). | | | Reasonable distribution of values. | In a customer database, you would expect a fairly even distribution of birthdays; a much larger number of birthdays on a given day of the year probably indicates a problem. | ^ **Cross data element validation rules.** \\ The rules require inspecting values in multiple data elements (typically in a single data file) to determine whether the data meets the quality rules. || | Valid values that depend on other column values | An overall list of location codes might pass the simple-column content rules, but only a smaller list of locations is valid if, for example, the region code is set to “West.” | | Optional becomes mandatory when other columns contain certain data. | The Value of Collateral field may be optional, but if the loan type is “mortgage,” a positive value must be filled into the Value of Collateral field. | | Mandatory becomes null when other columns contain certain data. | The Writing Insurance Agent Name field might normally be mandatory, but if the Origination Point is “web” (indicating the customer applied for the policy online), the Writing Insurance Agent Name field must then be blank. | | Cross-table validation rule. As the name suggests, these Data quality rules check columns (and combinations of columns) across tables. | An example cross-validates the name of a city with the name of a state in an address table—that is, Minneapolis is not in Wisconsin. | ^ **Cross data files validation rules.** \\ As the name suggests, these Data quality rules check data elements (and combinations of data elements) across data files. || | Mandatory presence of foreign-key relationships | If an account must have a customer, then the account table must have value in the Customer ID column that matches a value in the Customer ID column of the Customer table. | | Optional presence of foreign-key relationships depending on other data | If the Loan Type is “mortgage” in the Loan table, there must be a matching value for Loan_ID in the Collateral table. On the other hand, if the Loan_Type is “unsecured,” then there must not be a matching value for Loan_ID in the Collateral table because “unsecured” means there is no collateral for the loan. | | Columns in different tables are consistent | If the Collateral_Value column contains a value above a certain level, the Appraisal_Type must be “in person” because of the high value of the property. | === Classification B === In Table 2 Data quality rules are classified in ten categories. Table 2: Category of Data quality rule, description, and example ^ Category ^ Description ^ Example ^ | Domain List | A domain list rule defines a list of values that a data element is allowed to have. | The Gender data element can have 'M' or 'F'. | | Domain Pattern List | A domain pattern list rule defines a list of patterns that a data element is allowed to conform to. The patterns are defined in the regular expression syntax. | An example pattern for a telephone number is as follows: %%(^[[::space]]*[0-9]{ 3 }[[::punct|:space:]]?[0-9]{ 4 }[[::space]]*$)%% | | Domain Range | A domain range rule defines a range of values that a data element is allowed to have. | The value of the salary data element can be between 100 and 10000. | | Common Format / Pattern | A common format rule defines a known common format that a data element is allowed to conform to. | This rule type has many subtypes: Telephone Number, IP Address, SSN, URL, E-mail Address. | | No Nulls | A no nulls rule specifies that the data element cannot have null values | The department_id data element for an employee in the Employees table cannot be null. | | Functional Dependency | A functional dependency defines that the data in the data object may be normalized or derived | | | Unique Key | A unique key data rule defines whether a data element or group of data elements are unique in the given data object. | The name of a department should be unique. | | Referential | A referential data rule defines the type of a relationship (1:x) a value must have to another value. | The department_id data element of the Departments table should have a 1:n relationship with the department_id data element of the Employees table. | | Name and address | A name and address data rule evaluate a group of data elements as a name or address | | | Custom | A custom data rule applies a SQL expression that you specify to its input parameters. | VALID_DATE with two input parameters, START_DATE and END_DATE. A valid expression for this rule is: “THIS”.“END_DATE” > “THIS”.“START_DATE | === Examples A === Table 3 shows how a Data quality rule is derived from a business rule. Table 3: Example of a Data quality rules derived from Business Rule ^ Business Rule ^ Data quality rule ^ | The marital status code may have values of single, married, widowed, and divorced. It may not be left blank. A value must be picked when entering a new customer. The values for widowed and divorced are tracked separately from single because risk factors are sensitive to whether the customer was previously married and is not married anymore. | “Customer.Mar_Stat_Cd” may be “S,” “M,” “W,” or “D.” \\ Blank is considered an invalid value. | === Examples B === Table 4 shows Data quality rules of two fields. Table 4: Example of Data quality rules ^ Data element ^ Data quality rules ^ | Email | * An email must contain the ‘@’ sign. \\ * ‘@’ must be used only once. \\ * An email must contain any or all of the following: letters, digits, non-alphabetic characters, such as, ! # $ % & ‘ * + – / = ? ^ _ ` { . | Customer name | * The ‘Customer’s full name’ field an email refers to must not be ‘Null’. \\ * Customer’s full name must consist only of letters; no other characters allowed. \\ * Only first letters in customer name, middle name (if any), and surname must be capitalized. | === Tips === - Involve subject matter experts from various department. - Be moderate with the number of rules. Define rules that really matter according to the subject matter expert. Don’t create rules which already are implemented in the sources system as input check. - Favour a step-by-step approach. - Treat each field of the database individually and creates rules accordingly. - Decide between a centralized and local storage for your Data quality rules. === References === Data quality rule (2023). [[https://youtu.be/XBUCfOoxlUE|YouTube video]]. DAMA-NL. DAMA (2017). DAMA-DMBOK. Data Management Body of Knowledge. 2nd Edition. Technics PublicationsLlc. August 2017. DAMA Dictionary of Data Management. Data migration checklist: Planner + template for effective data migration planning — Data migration pro. (n.d.). Data Migration Pro. https://www.datamigrationpro.com/data-quality-rules-management-repository Data quality - Data rules. (2020, March 31). Datacadamia - Data and Co. https://datacadamia.com/data/quality/data_rule Data quality and business rules explained: Expert interview with Ronald G. Ross — Data quality pro. (2020, February 29). Data Quality Pro. https://www.dataqualitypro.com/blog/business-rules-for-data-quality-ronald-g-ross Data quality rule. (n.d.). ScienceDirect.com | Science, health and medical journals, full text articles and books. https://www.sciencedirect.com/topics/computer-science/data-quality-rule Reeb, B. (n.d.). Data quality rules. IData Insights Blog. https://blog.idatainc.com/data-quality-rules Tips to create effective Data quality rules. (2016, April 29). TDAN.com. https://tdan.com/tips-to-create-effective-data-quality-rules/24525 What are Data quality rules? How do they look, and why are they needed in data management? Our expert provides the answers! (2020, September 3). Data Quality Solutions: Data Quality Software & DQaaS | CDQ. https://www.cdq.ch/data-sharing/data-quality-rules {{tag>All DQMS}}