Data Mining: Fast Algorithms vs. Fast Results
Exploratory data analysis is typically an iterative, multi-step process in which data is cleaned, scaled, integrated, and various algorithms are applied to arrive at interesting insights. Most algorithmic research has concentrated on algorithms for a single step in this process, e.g., algorithms for constructing a predictive model from training data. However, the speed of an individual algorithm is rarely the bottleneck in a data mining project. The limiting factor is usually the difficulty of understanding the data, exploring numerous alternatives, and managing the analysis process and intermediate results. The alternatives include the choice of mining techniques, how they are applied, and to what subsets of data they are applied, leading to a rapid explosion in the number of potential analysis steps.