Skip to main content

Data Scrubbing

  • Reference work entry
  • First Online:
Encyclopedia of Database Systems
  • 46 Accesses

Synonyms

Data cleansing

Definition

Data scrubbing refers to the task of first identifying data that are corrupted, incomplete, invalid, missing, inconsistent, outdated, duplicated, or irrelevant and then either correcting or removing such “dirty” data. The aim of data scrubbing is to make data more accurate, more complete, and consistent both within and across different tables in a database or data warehouse.

An important challenge of data scrubbing is that “dirty” values do not necessarily contradict any database requirements, i.e., such values are consistent with the design of a database and its schema. Rather, errors occur at a higher conceptual level. Examples include credit card numbers that follow a correct grouping of four-times-four digits but that are invalid with regard to a check-sum algorithm, or addresses that have a valid zipcode value that is inconsistent with the town and state names in the same record. Such errors can occur because of a lack of checks and validation...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 4,499.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 6,499.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Recommended Reading

  1. Batini C, Scannapieco M. Data quality: concepts, methodologies and techniques, Data-centric systems and applications. Berlin: Springer; 2006.

    MATH  Google Scholar 

  2. Christen P. Data matching – concepts and techniques for record linkage, entity resolution, and duplicate detection, Data-centric systems and applications. Berlin: Springer; 2012.

    Google Scholar 

  3. Fan W, Geerts F, Jia X, Kementsietsidis A. Conditional functional dependencies for capturing data inconsistencies. ACM Trans Database Syst. 2008;33(2):6.

    Article  Google Scholar 

  4. Lee Y, Pipino L, Funk J, Wang R. Journey to data quality. Cambridge, MA: The MIT Press; 2009.

    Google Scholar 

  5. Maletic JI, Marcus A. Data cleansing: a prelude to knowledge discovery. In: Data mining and knowledge discovery handbook. New York: Springer; p. 19–32.

    Chapter  Google Scholar 

  6. Rahm E, Do HH. Data cleaning: problems and current approaches. IEEE Data Eng Bull. 2000;23(4):3–13.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peter Christen .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Christen, P. (2018). Data Scrubbing. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_80621

Download citation

Publish with us

Policies and ethics