Skip to main content

Part of the book series: Studies in Big Data ((SBD,volume 56))

  • 1056 Accesses

Abstract

One way of solving the problem of incompatibility between nonlinear split measures, like the information gain or the Gini gain, and the Hoeffding’s inequality is the application of another statistical tool, e.g. the McDiarmid’s inequality. Another way is to find a split measure which can be expressed as an arithmetic average of some random variables since the Hoeffding’s inequality is applicable in this case. In the literature, many different impurity measures, other than the information entropy or Gini index, were considered [1]. For example in [2, 3] the Kearns-Mansours index [4] was taken into account. A survey of various splitting criteria can be found in [5]. However, these functions are nonlinear and cannot be expressed as a desired sum of random variables. In this chapter, a split measure based on the misclassification error impurity measure is proposed [6, 7], which has the mentioned above property. In the case of misclassification error, the bounds obtained using the Hoeffding’s inequality and the McDiarmid’s inequality are equivalent.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Wang, Y., Xia, S.T.: Unifying attribute splitting criteria of decision trees by tsallis entropy. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2507–2511 (2017)

    Google Scholar 

  2. De Rosa, R., Cesa-Bianchi, N.: Splitting with confidence in decision trees with application to stream mining. In: 2015 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2015)

    Google Scholar 

  3. De Rosa, R., Cesa-Bianchi, N.: Confidence decision trees via online and active learning for streaming data. J. Artif. Intell. Res. 60(60), 1031–1055 (2017)

    Article  Google Scholar 

  4. Kearns, M. Mansour, Y.: On the boosting ability of top-down decision tree learning algorithms. In: Proceedings of the Twenty-Eighth Annual ACM Symposium on Theory of Computing, STOC ’96, pp. 459–468. ACM, New York, NY, USA (1996)

    Google Scholar 

  5. Sheth, N.S., Deshpande, A.R.: A review of splitting criteria for decision tree induction. Fuzzy Syst. 7(1) (2015)

    Google Scholar 

  6. Rutkowski, L., Jaworski, M., Duda, P., Pietruczuk, L.: On a splitting criterion in decision trees for data streams. In: Proceedings of the 9th Intenrational Conference on Machine Learning and Data Mining, pp. 7–11, ibai-Publishing, New York, USA (2013)

    Google Scholar 

  7. Matuszyk, P., Krempl, G., Spiliopoulou, M.: Correcting the usage of the Hoeffding inequality in stream mining. In: Tucker, A., Höppner, F., Siebes, A., Swift, S. (eds.) Advances in Intelligent Data Analysis XII. Lecture Notes in Computer Science, vol. 8207, pp. 298–309, Springer, Berlin, Heidelberg (2013)

    Google Scholar 

  8. Rutkowski, L., Jaworski, M., Pietruczuk, L., Duda, P.: A new method for data stream mining based on the misclassification error. IEEE Trans. Neural Netw. Learn. Syst. 26(5), 1048–1059 (2015)

    Article  MathSciNet  Google Scholar 

  9. Wasserman, L.: All of Statistics: A Concise Course in Statistical Inference. Springer Texts in Statistics, Springer, New York (2005)

    MATH  Google Scholar 

  10. Jin, R., Agrawal, G.: Efficient decision tree construction on streaming data. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’03, pp. 571–576. New York, NY, USA (2003)

    Google Scholar 

  11. Rutkowski, L., Jaworski, M., Pietruczuk, L., Duda, P.: Decision trees for mining data streams based on the Gaussian approximation. IEEE Trans. Knowl. Data Eng. 26(1), 108–119 (2014)

    Article  Google Scholar 

  12. Rutkowski, L., Jaworski, M., Pietruczuk, L., Duda, P.: The CART decision tree for mining data streams. Inf. Sci. 266, 1–15 (2014)

    Article  Google Scholar 

  13. Jaworski, M., Rutkowski, L., Pawlak, M.: Hybrid splitting criterion in decision trees for data stream mining. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) Artificial Intelligence and Soft Computing, (Cham), pp. 60–72, Springer International Publishing (2016)

    Google Scholar 

  14. Jaworski, M., Duda, P., Rutkowski, L.: New splitting criteria for decision trees in stationary data streams. IEEE Trans. Neural Netw. Learn. Syst. 29, 2516–2529 (2018)

    Article  MathSciNet  Google Scholar 

  15. Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 71–80 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Leszek Rutkowski .

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Rutkowski, L., Jaworski, M., Duda, P. (2020). Misclassification Error Impurity Measure. In: Stream Data Mining: Algorithms and Their Probabilistic Properties. Studies in Big Data, vol 56. Springer, Cham. https://doi.org/10.1007/978-3-030-13962-9_5

Download citation

Publish with us

Policies and ethics