Misclassification Error Impurity Measure

Rutkowski, Leszek; Jaworski, Maciej; Duda, Piotr

doi:10.1007/978-3-030-13962-9_5

Leszek Rutkowski^5,6,
Maciej Jaworski⁵ &
Piotr Duda⁵

Part of the book series: Studies in Big Data ((SBD,volume 56))

1056 Accesses

Abstract

One way of solving the problem of incompatibility between nonlinear split measures, like the information gain or the Gini gain, and the Hoeffding’s inequality is the application of another statistical tool, e.g. the McDiarmid’s inequality. Another way is to find a split measure which can be expressed as an arithmetic average of some random variables since the Hoeffding’s inequality is applicable in this case. In the literature, many different impurity measures, other than the information entropy or Gini index, were considered [1]. For example in [2, 3] the Kearns-Mansours index [4] was taken into account. A survey of various splitting criteria can be found in [5]. However, these functions are nonlinear and cannot be expressed as a desired sum of random variables. In this chapter, a split measure based on the misclassification error impurity measure is proposed [6, 7], which has the mentioned above property. In the case of misclassification error, the bounds obtained using the Hoeffding’s inequality and the McDiarmid’s inequality are equivalent.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Wang, Y., Xia, S.T.: Unifying attribute splitting criteria of decision trees by tsallis entropy. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2507–2511 (2017)
Google Scholar
De Rosa, R., Cesa-Bianchi, N.: Splitting with confidence in decision trees with application to stream mining. In: 2015 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2015)
Google Scholar
De Rosa, R., Cesa-Bianchi, N.: Confidence decision trees via online and active learning for streaming data. J. Artif. Intell. Res. 60(60), 1031–1055 (2017)
Article Google Scholar
Kearns, M. Mansour, Y.: On the boosting ability of top-down decision tree learning algorithms. In: Proceedings of the Twenty-Eighth Annual ACM Symposium on Theory of Computing, STOC ’96, pp. 459–468. ACM, New York, NY, USA (1996)
Google Scholar
Sheth, N.S., Deshpande, A.R.: A review of splitting criteria for decision tree induction. Fuzzy Syst. 7(1) (2015)
Google Scholar
Rutkowski, L., Jaworski, M., Duda, P., Pietruczuk, L.: On a splitting criterion in decision trees for data streams. In: Proceedings of the 9th Intenrational Conference on Machine Learning and Data Mining, pp. 7–11, ibai-Publishing, New York, USA (2013)
Google Scholar
Matuszyk, P., Krempl, G., Spiliopoulou, M.: Correcting the usage of the Hoeffding inequality in stream mining. In: Tucker, A., Höppner, F., Siebes, A., Swift, S. (eds.) Advances in Intelligent Data Analysis XII. Lecture Notes in Computer Science, vol. 8207, pp. 298–309, Springer, Berlin, Heidelberg (2013)
Google Scholar
Rutkowski, L., Jaworski, M., Pietruczuk, L., Duda, P.: A new method for data stream mining based on the misclassification error. IEEE Trans. Neural Netw. Learn. Syst. 26(5), 1048–1059 (2015)
Article MathSciNet Google Scholar
Wasserman, L.: All of Statistics: A Concise Course in Statistical Inference. Springer Texts in Statistics, Springer, New York (2005)
MATH Google Scholar
Jin, R., Agrawal, G.: Efficient decision tree construction on streaming data. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’03, pp. 571–576. New York, NY, USA (2003)
Google Scholar
Rutkowski, L., Jaworski, M., Pietruczuk, L., Duda, P.: Decision trees for mining data streams based on the Gaussian approximation. IEEE Trans. Knowl. Data Eng. 26(1), 108–119 (2014)
Article Google Scholar
Rutkowski, L., Jaworski, M., Pietruczuk, L., Duda, P.: The CART decision tree for mining data streams. Inf. Sci. 266, 1–15 (2014)
Article Google Scholar
Jaworski, M., Rutkowski, L., Pawlak, M.: Hybrid splitting criterion in decision trees for data stream mining. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) Artificial Intelligence and Soft Computing, (Cham), pp. 60–72, Springer International Publishing (2016)
Google Scholar
Jaworski, M., Duda, P., Rutkowski, L.: New splitting criteria for decision trees in stationary data streams. IEEE Trans. Neural Netw. Learn. Syst. 29, 2516–2529 (2018)
Article MathSciNet Google Scholar
Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 71–80 (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computational Intelligence, Czestochowa University of Technology, Częstochowa, Poland
Leszek Rutkowski, Maciej Jaworski & Piotr Duda
Information Technology Institute, University of Social Sciences, Lodz, Poland
Leszek Rutkowski

Authors

Leszek Rutkowski
View author publications
You can also search for this author in PubMed Google Scholar
Maciej Jaworski
View author publications
You can also search for this author in PubMed Google Scholar
Piotr Duda
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Leszek Rutkowski .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Rutkowski, L., Jaworski, M., Duda, P. (2020). Misclassification Error Impurity Measure. In: Stream Data Mining: Algorithms and Their Probabilistic Properties. Studies in Big Data, vol 56. Springer, Cham. https://doi.org/10.1007/978-3-030-13962-9_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-13962-9_5
Published: 17 March 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-13961-2
Online ISBN: 978-3-030-13962-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics