Encyclopedia of Machine Learning

2010 Edition
| Editors: Claude Sammut, Geoffrey I. Webb

Data Preparation

  • Geoffrey I. Webb
Reference work entry
DOI: https://doi.org/10.1007/978-0-387-30164-8_194



Before data can be analyzed, they must be organized into an appropriate form. Data preparation is the process of manipulating and organizing data prior to analysis.

Motivation and Background

Data are collected for many purposes, not necessarily with machine learning in mind. Consequently, there is often a need to identify and extract relevant data for the given analytic purpose. Every learning system has specific requirements about how data must be presented for analysis and hence, data must be transformed to fulfill those requirements. Further, the selection of the specific data to be analyzed can greatly affect the models that are learned. For these reasons, data preparation is a critical part of any machine learning exercise. Data preparation is often the most time-consuming part of any nontrivial machine learning project.

Processes and Techniques

The manner in which data are prepared varies greatly depending upon the...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. Pyle, D. (1999). Data preparation for data mining. San Francisco, Morgan Kaufmann.Google Scholar
  2. Witten, I. H., & Frank, E. (2005). Data mining: Practical machine learning tools and techniques (2nd ed.). San Francisco, Morgan Kaufmann.zbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Geoffrey I. Webb

There are no affiliations available