A Wrapper Evolutionary Approach for Supervised Multivariate Discretization: A Case Study on Decision Trees

  • Sergio Ramírez-Gallego
  • Salvador García
  • José Manuel Benítez
  • Francisco Herrera
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 403)

Abstract

The main objective of discretization is to transform numerical attributes into discrete ones. The intention is to provide the possibility to use some learning algorithms which require discrete data as input and to help the experts to understand the data more easily. Due to the fact that in classification problems there are high interactions among multiple attributes, we propose the use of evolutionary algorithms to select a subset of cut points for multivariate discretization based on a wrapper fitness function. The algorithm proposed has been compared with the best state-of-the-art discretizers with two decision trees-based classifiers: C4.5 and PUBLIC. The results reported indicate that our proposal outperforms the rest of the discretizers in terms of accuracy and requiring a lower number of intervals.

Keywords

Discretization Numerical attributes Evolutionary algorithms Data preprocessing Classification 

References

  1. 1.
    Alcalá-Fdez, J., Sánchez, L., García, S., del Jesus, M.J., Ventura, S., Garrell, J.M., Otero, J., Romero, C., Bacardit, J., Rivas, V.M., Fernández, J.C., Herrera, F.: KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput. 13(3), 307–318 (2009)CrossRefGoogle Scholar
  2. 2.
    Bache, K., Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
  3. 3.
    Cios, K.J., Pedrycz, W., Swiniarski, R.W., Kurgan, L.A.: Data Mining: A Knowledge Discovery Approach. Springer, New York (2007)MATHGoogle Scholar
  4. 4.
    Dash, M., Liu, H.: Consistency-based search in feature selection. Artif. Intell. 151(1–2), 155–176 (2003)MathSciNetCrossRefMATHGoogle Scholar
  5. 5.
    Elomaa, T., Rousu, J.: General and efficient multisplitting of numerical attributes. Mach. Learn. 36, 201–244 (1999)CrossRefMATHGoogle Scholar
  6. 6.
    Eshelman, L.J.: The CHC adaptive search algorithm: how to have safe search when engaging in nontraditional genetic recombination. In: FOGA, pp. 265–283 (1990)Google Scholar
  7. 7.
    Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the 13th International Joint Conference on Artificial Intelligence (IJCAI), pp. 1022–1029 (1993)Google Scholar
  8. 8.
    Freitas, A.A.: Data Mining and Knowledge Discovery with Evolutionary Algorithms. Springer, New York (2002)CrossRefMATHGoogle Scholar
  9. 9.
    García, S., Luengo, J., Sáez, J.A., López, V., Herrera, F.: A survey of discretization techniques: taxonomy and empirical analysis in supervised learning. IEEE Trans. Knowl. Data Eng. 25(4), 734–750 (2013)CrossRefGoogle Scholar
  10. 10.
    García, S., Luengo, J., Herrera, F.: Data Preprocessing in Data Mining. Springer, New York (2015)CrossRefGoogle Scholar
  11. 11.
    He, Z., Tian, S., Huang, H.: EMVD-BDC: an evolutionary multivariate discretization approach for association rules. J. Comput. Inf. Syst. 2(4), 1343–1350 (2006)Google Scholar
  12. 12.
    Kerber, R.: ChiMerge: discretization of numeric attributes. In: National Conference on Artificial Intelligence American Association for Artificial Intelligence (AAAI92), pp. 123–128 (1992)Google Scholar
  13. 13.
    Kurgan, L.A., Cios, K.J.: CAIM discretization algorithm. IEEE Trans. Knowl. Data Eng. 16(2), 145–153 (2004)CrossRefGoogle Scholar
  14. 14.
    Liu, H., Hussain, F., Tan, C.L., Dash, M.: Discretization: an enabling technique. Data Min. Knowl. Discov. 6(4), 393–423 (2002)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Mateo (1993)Google Scholar
  16. 16.
    Sheng, W., Liu, X., Fairhurst, M.C.: A niching memetic algorithm for simultaneous clustering and feature selection. IEEE Trans. Knowl. Data Eng. 20(7), 868–879 (2008)CrossRefGoogle Scholar
  17. 17.
    Tay, F.E.H., Shen, L.: A modified Chi2 algorithm for discretization. IEEE Trans. Knowl. Data Eng. 14, 666–670 (2002)CrossRefGoogle Scholar
  18. 18.
    Wu, X., Kumar, V. (eds.): The Top Ten Algorithms in Data Mining. Chapman & Hall/CRC Data Mining and Knowledge Discovery, Boca Raton (2009)Google Scholar
  19. 19.
    Yang, Y., Webb, G.I.: Discretization for Naive-Bayes learning: managing discretization bias and variance. Mach. Learn. 74(1), 39–74 (2009)CrossRefGoogle Scholar
  20. 20.
    Zighed, D.A., Rabaséda, S., Rakotomalala, R.: FUSINTER: a method for discretization of continuous attributes. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 6, 307–326 (1998)CrossRefMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Sergio Ramírez-Gallego
    • 1
  • Salvador García
    • 1
  • José Manuel Benítez
    • 1
  • Francisco Herrera
    • 1
  1. 1.Department of Computer Science and Artificial IntelligenceUniversity of GranadaGranadaSpain

Personalised recommendations