- Peter ChristenAffiliated withResearch School of Computer Science, The Australian National University Email author
Data scrubbing refers to the task of first identifying data that are corrupted, incomplete, invalid, missing, inconsistent, outdated, duplicated, or irrelevant and then either correcting or removing such “dirty” data. The aim of data scrubbing is to make data more accurate, more complete, and consistent both within and across different tables in a database or data warehouse.
An important challenge of data scrubbing is that “dirty” values do not necessarily contradict any database requirements, i.e., such values are consistent with the design of a database and its schema. Rather, errors occur at a higher conceptual level. Examples include credit card numbers that follow a correct grouping of four-times-four digits but that are invalid with regard to a check-sum algorithm, or addresses that have a valid zipcode value that is inconsistent with the town and state names in the same record. Such errors can occur because of a lack of checks and validation during ...
Reference Work Entry Metrics
Date: 2016 (Latest)History
- 2016 (Latest)
- Data Scrubbing
- Reference Work Title
- Encyclopedia of Database Systems
- pp 1-5
- Online ISBN
- Springer New York
- Copyright Holder
- Springer Science+Business Media New York
- Industry Sectors
- eBook Packages
- Editor Affiliations
- 1. Georgia Institute of Technology College of Computing
- 2. University of Waterloo School of Computer Science
- Peter Christen (3)
- Author Affiliations
- 3. Research School of Computer Science, The Australian National University, Canberra, 2601, Australia
To view the rest of this content please follow the download PDF link above.