Data Scrubbing

Christen, Peter

doi:10.1007/978-1-4899-7993-3_80621-1

Peter Christen³

69 Accesses

Synonyms

Data cleansing

Definition

Data scrubbing refers to the task of first identifying data that are corrupted, incomplete, invalid, missing, inconsistent, outdated, duplicated, or irrelevant and then either correcting or removing such “dirty” data. The aim of data scrubbing is to make data more accurate, more complete, and consistent both within and across different tables in a database or data warehouse.

An important challenge of data scrubbing is that “dirty” values do not necessarily contradict any database requirements, i.e., such values are consistent with the design of a database and its schema. Rather, errors occur at a higher conceptual level. Examples include credit card numbers that follow a correct grouping of four-times-four digits but that are invalid with regard to a check-sum algorithm, or addresses that have a valid zipcode value that is inconsistent with the town and state names in the same record. Such errors can occur because of a lack of checks and validation...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Author information

Authors and Affiliations

Research School of Computer Science, The Australian National University, Canberra, 2601, Australia
Peter Christen

Authors

Peter Christen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peter Christen .

Editor information

Editors and Affiliations

Georgia Institute of Technology College of Computing, Atlanta, Georgia, USA
Ling Liu
University of Waterloo School of Computer Science, Waterloo, Ontario, Canada
M. Tamer Özsu

Section Editor information

Information Systems, Hasso-Plattner-Institute, Prof.-Dr.-Helmert-Str. 2-3, 14482, Potsdam, Germany
Felix Naumann

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Christen, P. (2016). Data Scrubbing. In: Liu, L., Özsu, M. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4899-7993-3_80621-1

Download citation

DOI: https://doi.org/10.1007/978-1-4899-7993-3_80621-1
Received: 16 September 2015
Accepted: 29 April 2016
Published: 06 February 2017
Publisher Name: Springer, New York, NY
Online ISBN: 978-1-4899-7993-3
eBook Packages: Springer Reference Computer SciencesReference Module Computer Science and Engineering

Publish with us

Policies and ethics

Data Scrubbing

Synonyms

Definition

Access this chapter

Recommended Reading

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Copyright information

About this entry

Cite this entry

Download citation

Publish with us

Navigation

Data Scrubbing

Synonyms

Definition

Access this chapter

Recommended Reading

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Copyright information

About this entry

Cite this entry

Download citation

Publish with us

Search

Navigation