Abstract
One way of solving the problem of incompatibility between nonlinear split measures, like the information gain or the Gini gain, and the Hoeffding’s inequality is the application of another statistical tool, e.g. the McDiarmid’s inequality. Another way is to find a split measure which can be expressed as an arithmetic average of some random variables since the Hoeffding’s inequality is applicable in this case. In the literature, many different impurity measures, other than the information entropy or Gini index, were considered [1]. For example in [2, 3] the Kearns-Mansours index [4] was taken into account. A survey of various splitting criteria can be found in [5]. However, these functions are nonlinear and cannot be expressed as a desired sum of random variables. In this chapter, a split measure based on the misclassification error impurity measure is proposed [6, 7], which has the mentioned above property. In the case of misclassification error, the bounds obtained using the Hoeffding’s inequality and the McDiarmid’s inequality are equivalent.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Wang, Y., Xia, S.T.: Unifying attribute splitting criteria of decision trees by tsallis entropy. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2507–2511 (2017)
De Rosa, R., Cesa-Bianchi, N.: Splitting with confidence in decision trees with application to stream mining. In: 2015 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2015)
De Rosa, R., Cesa-Bianchi, N.: Confidence decision trees via online and active learning for streaming data. J. Artif. Intell. Res. 60(60), 1031–1055 (2017)
Kearns, M. Mansour, Y.: On the boosting ability of top-down decision tree learning algorithms. In: Proceedings of the Twenty-Eighth Annual ACM Symposium on Theory of Computing, STOC ’96, pp. 459–468. ACM, New York, NY, USA (1996)
Sheth, N.S., Deshpande, A.R.: A review of splitting criteria for decision tree induction. Fuzzy Syst. 7(1) (2015)
Rutkowski, L., Jaworski, M., Duda, P., Pietruczuk, L.: On a splitting criterion in decision trees for data streams. In: Proceedings of the 9th Intenrational Conference on Machine Learning and Data Mining, pp. 7–11, ibai-Publishing, New York, USA (2013)
Matuszyk, P., Krempl, G., Spiliopoulou, M.: Correcting the usage of the Hoeffding inequality in stream mining. In: Tucker, A., Höppner, F., Siebes, A., Swift, S. (eds.) Advances in Intelligent Data Analysis XII. Lecture Notes in Computer Science, vol. 8207, pp. 298–309, Springer, Berlin, Heidelberg (2013)
Rutkowski, L., Jaworski, M., Pietruczuk, L., Duda, P.: A new method for data stream mining based on the misclassification error. IEEE Trans. Neural Netw. Learn. Syst. 26(5), 1048–1059 (2015)
Wasserman, L.: All of Statistics: A Concise Course in Statistical Inference. Springer Texts in Statistics, Springer, New York (2005)
Jin, R., Agrawal, G.: Efficient decision tree construction on streaming data. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’03, pp. 571–576. New York, NY, USA (2003)
Rutkowski, L., Jaworski, M., Pietruczuk, L., Duda, P.: Decision trees for mining data streams based on the Gaussian approximation. IEEE Trans. Knowl. Data Eng. 26(1), 108–119 (2014)
Rutkowski, L., Jaworski, M., Pietruczuk, L., Duda, P.: The CART decision tree for mining data streams. Inf. Sci. 266, 1–15 (2014)
Jaworski, M., Rutkowski, L., Pawlak, M.: Hybrid splitting criterion in decision trees for data stream mining. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) Artificial Intelligence and Soft Computing, (Cham), pp. 60–72, Springer International Publishing (2016)
Jaworski, M., Duda, P., Rutkowski, L.: New splitting criteria for decision trees in stationary data streams. IEEE Trans. Neural Netw. Learn. Syst. 29, 2516–2529 (2018)
Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 71–80 (2000)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Rutkowski, L., Jaworski, M., Duda, P. (2020). Misclassification Error Impurity Measure. In: Stream Data Mining: Algorithms and Their Probabilistic Properties. Studies in Big Data, vol 56. Springer, Cham. https://doi.org/10.1007/978-3-030-13962-9_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-13962-9_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-13961-2
Online ISBN: 978-3-030-13962-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)