A Wrapper Evolutionary Approach for Supervised Multivariate Discretization: A Case Study on Decision Trees
The main objective of discretization is to transform numerical attributes into discrete ones. The intention is to provide the possibility to use some learning algorithms which require discrete data as input and to help the experts to understand the data more easily. Due to the fact that in classification problems there are high interactions among multiple attributes, we propose the use of evolutionary algorithms to select a subset of cut points for multivariate discretization based on a wrapper fitness function. The algorithm proposed has been compared with the best state-of-the-art discretizers with two decision trees-based classifiers: C4.5 and PUBLIC. The results reported indicate that our proposal outperforms the rest of the discretizers in terms of accuracy and requiring a lower number of intervals.
KeywordsDiscretization Numerical attributes Evolutionary algorithms Data preprocessing Classification
This work was partially supported by the Spanish Ministry of Science and Technology under project TIN2011-28488 and the Andalusian Research Plans P11-TIC-7765, P10-TIC-6858.
- 1.Alcalá-Fdez, J., Sánchez, L., García, S., del Jesus, M.J., Ventura, S., Garrell, J.M., Otero, J., Romero, C., Bacardit, J., Rivas, V.M., Fernández, J.C., Herrera, F.: KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput. 13(3), 307–318 (2009)CrossRefGoogle Scholar
- 2.Bache, K., Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
- 6.Eshelman, L.J.: The CHC adaptive search algorithm: how to have safe search when engaging in nontraditional genetic recombination. In: FOGA, pp. 265–283 (1990)Google Scholar
- 7.Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the 13th International Joint Conference on Artificial Intelligence (IJCAI), pp. 1022–1029 (1993)Google Scholar
- 11.He, Z., Tian, S., Huang, H.: EMVD-BDC: an evolutionary multivariate discretization approach for association rules. J. Comput. Inf. Syst. 2(4), 1343–1350 (2006)Google Scholar
- 12.Kerber, R.: ChiMerge: discretization of numeric attributes. In: National Conference on Artificial Intelligence American Association for Artificial Intelligence (AAAI92), pp. 123–128 (1992)Google Scholar
- 15.Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Mateo (1993)Google Scholar
- 18.Wu, X., Kumar, V. (eds.): The Top Ten Algorithms in Data Mining. Chapman & Hall/CRC Data Mining and Knowledge Discovery, Boca Raton (2009)Google Scholar