Encyclopedia of Database Systems

Living Edition
| Editors: Ling Liu, M. Tamer Özsu

Data Scrubbing

Living reference work entry
DOI: https://doi.org/10.1007/978-1-4899-7993-3_80621-1



Data scrubbing refers to the task of first identifying data that are corrupted, incomplete, invalid, missing, inconsistent, outdated, duplicated, or irrelevant and then either correcting or removing such “dirty” data. The aim of data scrubbing is to make data more accurate, more complete, and consistent both within and across different tables in a database or data warehouse.

An important challenge of data scrubbing is that “dirty” values do not necessarily contradict any database requirements, i.e., such values are consistent with the design of a database and its schema. Rather, errors occur at a higher conceptual level. Examples include credit card numbers that follow a correct grouping of four-times-four digits but that are invalid with regard to a check-sum algorithm, or addresses that have a valid zipcode value that is inconsistent with the town and state names in the same record. Such errors can occur because of a lack of checks and validation...


Data Warehouse Customer Relationship Management Automatic Speech Recognition Street Type Approximate String Match 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in to check access

Recommended Reading

  1. 1.
    Batini C, Scannapieco M. Data quality: concepts, methodologies and techniques, Data-centric systems and applications. Berlin: Springer; 2006.MATHGoogle Scholar
  2. 2.
    Christen P. Data matching – concepts and techniques for record linkage, entity resolution, and duplicate detection, Data-centric systems and applications. Berlin: Springer; 2012.Google Scholar
  3. 3.
    Fan W, Geerts F, Jia X, Kementsietsidis A. Conditional functional dependencies for capturing data inconsistencies. ACM Trans Database Syst. 2008;33(2):6.CrossRefGoogle Scholar
  4. 4.
    Lee Y, Pipino L, Funk J, Wang R. Journey to data quality. The MIT Press; Cambridge, Massachusetts, 2009.Google Scholar
  5. 5.
    Maletic JI, Marcus A. Data cleansing: a prelude to knowledge discovery. In: Data mining and knowledge discovery handbook. Springer; New York, US, 2010, p. 19–32.Google Scholar
  6. 6.
    Rahm E, Do HH. Data cleaning: problems and current approaches. IEEE Data Eng Bull. 2000;23(4):3–13.Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Research School of Computer ScienceThe Australian National UniversityCanberraAustralia