Advertisement

Misclassification Error Impurity Measure

  • Leszek RutkowskiEmail author
  • Maciej Jaworski
  • Piotr Duda
Chapter
Part of the Studies in Big Data book series (SBD, volume 56)

Abstract

One way of solving the problem of incompatibility between nonlinear split measures, like the information gain or the Gini gain, and the Hoeffding’s inequality is the application of another statistical tool, e.g. the McDiarmid’s inequality. Another way is to find a split measure which can be expressed as an arithmetic average of some random variables since the Hoeffding’s inequality is applicable in this case. In the literature, many different impurity measures, other than the information entropy or Gini index, were considered [1]. For example in [2, 3] the Kearns-Mansours index [4] was taken into account. A survey of various splitting criteria can be found in [5]. However, these functions are nonlinear and cannot be expressed as a desired sum of random variables. In this chapter, a split measure based on the misclassification error impurity measure is proposed [6, 7], which has the mentioned above property. In the case of misclassification error, the bounds obtained using the Hoeffding’s inequality and the McDiarmid’s inequality are equivalent.

References

  1. 1.
    Wang, Y., Xia, S.T.: Unifying attribute splitting criteria of decision trees by tsallis entropy. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2507–2511 (2017)Google Scholar
  2. 2.
    De Rosa, R., Cesa-Bianchi, N.: Splitting with confidence in decision trees with application to stream mining. In: 2015 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2015)Google Scholar
  3. 3.
    De Rosa, R., Cesa-Bianchi, N.: Confidence decision trees via online and active learning for streaming data. J. Artif. Intell. Res. 60(60), 1031–1055 (2017)CrossRefGoogle Scholar
  4. 4.
    Kearns, M. Mansour, Y.: On the boosting ability of top-down decision tree learning algorithms. In: Proceedings of the Twenty-Eighth Annual ACM Symposium on Theory of Computing, STOC ’96, pp. 459–468. ACM, New York, NY, USA (1996)Google Scholar
  5. 5.
    Sheth, N.S., Deshpande, A.R.: A review of splitting criteria for decision tree induction. Fuzzy Syst. 7(1) (2015)Google Scholar
  6. 6.
    Rutkowski, L., Jaworski, M., Duda, P., Pietruczuk, L.: On a splitting criterion in decision trees for data streams. In: Proceedings of the 9th Intenrational Conference on Machine Learning and Data Mining, pp. 7–11, ibai-Publishing, New York, USA (2013)Google Scholar
  7. 7.
    Matuszyk, P., Krempl, G., Spiliopoulou, M.: Correcting the usage of the Hoeffding inequality in stream mining. In: Tucker, A., Höppner, F., Siebes, A., Swift, S. (eds.) Advances in Intelligent Data Analysis XII. Lecture Notes in Computer Science, vol. 8207, pp. 298–309, Springer, Berlin, Heidelberg (2013)Google Scholar
  8. 8.
    Rutkowski, L., Jaworski, M., Pietruczuk, L., Duda, P.: A new method for data stream mining based on the misclassification error. IEEE Trans. Neural Netw. Learn. Syst. 26(5), 1048–1059 (2015)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Wasserman, L.: All of Statistics: A Concise Course in Statistical Inference. Springer Texts in Statistics, Springer, New York (2005)zbMATHGoogle Scholar
  10. 10.
    Jin, R., Agrawal, G.: Efficient decision tree construction on streaming data. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’03, pp. 571–576. New York, NY, USA (2003)Google Scholar
  11. 11.
    Rutkowski, L., Jaworski, M., Pietruczuk, L., Duda, P.: Decision trees for mining data streams based on the Gaussian approximation. IEEE Trans. Knowl. Data Eng. 26(1), 108–119 (2014)CrossRefGoogle Scholar
  12. 12.
    Rutkowski, L., Jaworski, M., Pietruczuk, L., Duda, P.: The CART decision tree for mining data streams. Inf. Sci. 266, 1–15 (2014)CrossRefGoogle Scholar
  13. 13.
    Jaworski, M., Rutkowski, L., Pawlak, M.: Hybrid splitting criterion in decision trees for data stream mining. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) Artificial Intelligence and Soft Computing, (Cham), pp. 60–72, Springer International Publishing (2016)Google Scholar
  14. 14.
    Jaworski, M., Duda, P., Rutkowski, L.: New splitting criteria for decision trees in stationary data streams. IEEE Trans. Neural Netw. Learn. Syst. 29, 2516–2529 (2018)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 71–80 (2000)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Leszek Rutkowski
    • 1
    • 2
    Email author
  • Maciej Jaworski
    • 1
  • Piotr Duda
    • 1
  1. 1.Institute of Computational IntelligenceCzestochowa University of TechnologyCzęstochowaPoland
  2. 2.Information Technology InstituteUniversity of Social SciencesLodzPoland

Personalised recommendations