Advertisement

Splitting Criteria Based on the McDiarmid’s Theorem

  • Leszek RutkowskiEmail author
  • Maciej Jaworski
  • Piotr Duda
Chapter
Part of the Studies in Big Data book series (SBD, volume 56)

Abstract

Since the Hoeffding’s inequality proved to be irrelevant in establishing splitting criteria for the information gain and the Gini gain, a new statistical tool has to be proposed. In this chapter, the McDiarmid’s inequality [1] is introduced, which is a generalization of the Hoeffding’s one to any nonlinear functions. Further extensions and analysis of the McDiarmid’s inequality can be found in [2]. Based on the McDiarmid’s inequality, two theorems are presented in this book: one for the information gain and one for the Gini index. These theorems were first published in [3]. The obtained bounds were improved in [4, 5]. In the case of the Gini index, the corresponding bound was further tightened even more in [6]. Hence, finally this book considers the bound for the information gain taken from [5] and the bound for the Gini index published in [6].

References

  1. 1.
    McDiarmid, C.: On the method of bounded differences. Surveys in Combinatorics, pp. 148–188 (1989)Google Scholar
  2. 2.
    Combes, R.: An extension of McDiarmid’s inequality. CoRR (2015). arXiv:1511.05240
  3. 3.
    Rutkowski, L., Pietruczuk, L., Duda, P., Jaworski, M.: Decision trees for mining data streams based on the McDiarmid’s bound. IEEE Trans. Knowl. Data Eng. 25(6), 1272–1279 (2013)CrossRefGoogle Scholar
  4. 4.
    De Rosa, R., Cesa-Bianchi, N.: Splitting with confidence in decision trees with application to stream mining. In: 2015 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2015)Google Scholar
  5. 5.
    De Rosa, R., Cesa-Bianchi, N.: Confidence decision trees via online and active learning for streaming data. J. Artif. Intell. Res. 60(60), 1031–1055 (2017)CrossRefGoogle Scholar
  6. 6.
    Jaworski, M., Duda, P., Rutkowski, L.: New splitting criteria for decision trees in stationary data streams. IEEE Trans. Neural Netw. Learn. Syst. 29, 2516–2529 (2018)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 71–80 (2000)Google Scholar
  8. 8.
    Duda, P., Jaworski, M., Pietruczuk, L., Rutkowski, L.: A novel application of Hoeffding’s inequality to decision trees construction for data streams. In: 2014 International Joint Conference on Neural Networks (IJCNN), pp. 3324–3330 (2014)Google Scholar
  9. 9.
    Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 97–106 (2001)Google Scholar
  10. 10.
    Pietruczuk, L., Duda, P., Jaworski, M.: Adaptation of decision trees for handling concept drift. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) Artificial Intelligence and Soft Computing, pp. 459–473. Springer, Berlin (2013)CrossRefGoogle Scholar
  11. 11.
    Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58, 13–30 (1963)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Leszek Rutkowski
    • 1
    • 2
    Email author
  • Maciej Jaworski
    • 1
  • Piotr Duda
    • 1
  1. 1.Institute of Computational IntelligenceCzestochowa University of TechnologyCzęstochowaPoland
  2. 2.Information Technology InstituteUniversity of Social SciencesLodzPoland

Personalised recommendations