Abstract
Before data can be analyzed, they must be organized into an appropriate form. Data preparation is the process of manipulating and organizing data prior to analysis.Data preparation is typically an iterative process of manipulating raw data, which is often unstructured and messy, into a more structured and useful form that is ready for further analysis. The whole preparation process consists of a series of major activities (or tasks) including data profiling, cleansing, integration, and transformation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Recommended Reading
Barnett V, Lewis T (1994) Outliers in statistical data, 3rd edn. Wiley series in probability and mathematical statistics. Applied probability and statistics. Wiley, Chichester/New York
Crown (2015) Data science report. http://visit.crowdflower.com/2015-data-scientist-report.html
Dasu T, Johnson T (2003) Exploratory data mining and data cleaning, vol 479. Wiley, New York
Data science report (2014) http://visit.crowdflower.com/2015-data-scientist-report.html
Doan A, Halevy A, Ives Z (2012) Principles of data integration. Morgan Kaufmann, Waltham
For Big-Data Scientists (2014) ‘Janitor Work’ Is Key Hurdle to Insights. http://www.nytimes.com/2014/08/18/technology/for-big-data-scientists-hurdle-to-insights-is-janitor-work.html? r=0&module=ArrowsNav&contentCollection=Technology&action=keypress®ion=FixedLeft&pgtype=articlehttp://www.nytimes.com/2014/08/18/technology/for-big-data-scientists-hurdle-to-insights-is-janitor-work.html? r=0&module=ArrowsNav&contentCollection=Technology&action=keypress®ion=FixedLeft&pgtype=article (The NYT article by Steve Lohr)
GarcÃa S, Luengo J, Herrera F (2015) Data preprocessing in data mining. Springer, Cham
Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, Burlington
Müller H, Freytag J-C (2005) Problems, methods, and challenges in comprehensive data cleansing. Professoren des Inst. Für Informatik, Berlin
Pyle D (1999) Data preparation for data mining. Morgan Kaufmann, San Francisco
Tukey JW (1977) Exploratory data analysis, pp 2–3
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, Amsterdam/Boston
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media New York
About this entry
Cite this entry
Abdallah, Z.S., Du, L., Webb, G.I. (2017). Data Preparation. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7687-1_62
Download citation
DOI: https://doi.org/10.1007/978-1-4899-7687-1_62
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4899-7685-7
Online ISBN: 978-1-4899-7687-1
eBook Packages: Computer ScienceReference Module Computer Science and Engineering