Skip to main content

Data Preparation

  • Reference work entry
  • First Online:
Encyclopedia of Machine Learning and Data Mining

Abstract

Before data can be analyzed, they must be organized into an appropriate form. Data preparation is the process of manipulating and organizing data prior to analysis.Data preparation is typically an iterative process of manipulating raw data, which is often unstructured and messy, into a more structured and useful form that is ready for further analysis. The whole preparation process consists of a series of major activities (or tasks) including data profiling, cleansing, integration, and transformation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 699.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 949.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Recommended Reading

  • Barnett V, Lewis T (1994) Outliers in statistical data, 3rd edn. Wiley series in probability and mathematical statistics. Applied probability and statistics. Wiley, Chichester/New York

    MATH  Google Scholar 

  • Crown (2015) Data science report. http://visit.crowdflower.com/2015-data-scientist-report.html

  • Dasu T, Johnson T (2003) Exploratory data mining and data cleaning, vol 479. Wiley, New York

    Book  MATH  Google Scholar 

  • Data science report (2014) http://visit.crowdflower.com/2015-data-scientist-report.html

  • Doan A, Halevy A, Ives Z (2012) Principles of data integration. Morgan Kaufmann, Waltham

    Google Scholar 

  • For Big-Data Scientists (2014) ‘Janitor Work’ Is Key Hurdle to Insights. http://www.nytimes.com/2014/08/18/technology/for-big-data-scientists-hurdle-to-insights-is-janitor-work.html? r=0&module=ArrowsNav&contentCollection=Technology&action=keypress&region=FixedLeft&pgtype=articlehttp://www.nytimes.com/2014/08/18/technology/for-big-data-scientists-hurdle-to-insights-is-janitor-work.html? r=0&module=ArrowsNav&contentCollection=Technology&action=keypress&region=FixedLeft&pgtype=article (The NYT article by Steve Lohr)

    Google Scholar 

  • García S, Luengo J, Herrera F (2015) Data preprocessing in data mining. Springer, Cham

    Book  Google Scholar 

  • Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, Burlington

    MATH  Google Scholar 

  • Müller H, Freytag J-C (2005) Problems, methods, and challenges in comprehensive data cleansing. Professoren des Inst. Für Informatik, Berlin

    Google Scholar 

  • Pyle D (1999) Data preparation for data mining. Morgan Kaufmann, San Francisco

    Google Scholar 

  • Tukey JW (1977) Exploratory data analysis, pp 2–3

    Google Scholar 

  • Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, Amsterdam/Boston

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Geoffrey I. Webb .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media New York

About this entry

Cite this entry

Abdallah, Z.S., Du, L., Webb, G.I. (2017). Data Preparation. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7687-1_62

Download citation

Publish with us

Policies and ethics