Automated Data Pre-processing via Meta-learning
- Cite this paper as:
- Bilalli B., Abelló A., Aluja-Banet T., Wrembel R. (2016) Automated Data Pre-processing via Meta-learning. In: Bellatreche L., Pastor Ó., Almendros Jiménez J., Aït-Ameur Y. (eds) Model and Data Engineering. MEDI 2016. Lecture Notes in Computer Science, vol 9893. Springer, Cham
A data mining algorithm may perform differently on datasets with different characteristics, e.g., it might perform better on a dataset with continuous attributes rather than with categorical attributes, or the other way around. As a matter of fact, a dataset usually needs to be pre-processed. Taking into account all the possible pre-processing operators, there exists a staggeringly large number of alternatives and non-experienced users become overwhelmed. We show that this problem can be addressed by an automated approach, leveraging ideas from meta-learning. Specifically, we consider a wide range of data pre-processing techniques and a set of data mining algorithms. For each data mining algorithm and selected dataset, we are able to predict the transformations that improve the result of the algorithm on the respective dataset. Our approach will help non-expert users to more effectively identify the transformations appropriate to their applications, and hence to achieve improved results.