Skip to main content
Log in

Improving lazy decision tree for imbalanced classification by using skew-insensitive criteria

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Lazy decision tree (LazyDT) constructs a customized decision tree for each test instance, which consists of only a single path from the root to a leaf node. LazyDT has two strengths in comparison with eager decision trees. One is that LazyDT can build shorter decision paths than eager decision trees, and the other is that LazyDT can avoid unnecessary data fragmentation. However, the split criterion used for constructing a customized tree in LazyDT is information gain, which is skew-sensitive. When learning from imbalanced data sets, class imbalance impedes their ability to learn the minority class concept. In this paper, we use Hellinger distance and K-L divergence as split criteria to build two types of lazy decision trees. An experimental framework is performed across a wide range of imbalanced data sets to investigate the effectiveness of our methods when comparing with the other methods including lazy decision tree, C4.5, Hellinger distance based decision tree and support vector machine. In addition, we also use SMOTE to preprocess the highly imbalance data sets in the experiment and evaluate its effectiveness. The experimental results, which contrasted through nonparametric statistical tests, demonstrate that using Hellinger distance and K-L divergence as the split criterion can improve the performances of LazyDT for imbalanced classification effectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. http://www.keel.es

  2. http://www.keel.es

References

  1. Quinlan JR (2014) C4.5: Programs for machine learning. Elsevier, Amsterdam

    Google Scholar 

  2. Friedman JH, Kohavi R, Yun Y (1996) Lazy decision trees. AAAI/IAAI 1:717–724

    Google Scholar 

  3. Bagallo G, Haussler D (1990) Boolean feature discovery in empirical learning. Mach Learn 5(1):71–99

    Article  Google Scholar 

  4. Mahmoudi N, Duman E (2015) Detecting credit card fraud by modified fisher discriminant analysis. Expert Syst Appl 42(5):2510–2516

    Article  Google Scholar 

  5. Khor KC, Ting CY, Phon-Amnuaisuk S (2014) The effectiveness of sampling methods for the imbalanced network intrusion detection data set. Recent Advances on Soft Computing and Data Mining. Springer, Cham, pp 613–622

  6. Wan X, Liu J, Cheung WK (2014) Learning to improve medical decision making from imbalanced data without a priori cost. BMC medical informatics and decision making 14(1):111

    Article  Google Scholar 

  7. He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 12(9):1263–1284

    Google Scholar 

  8. López V, Fernández A, García S (2014) An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Inf Sci 250:113–141

    Article  Google Scholar 

  9. Krawczyk B (2016) Learning from imbalanced data: Open challenges and future directions. Progress in Artificial Intelligence 5(4):221–232

    Article  Google Scholar 

  10. Chawla NV, Bowyer KW, Hall LO (2002) SMOTE: Synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    Article  MATH  Google Scholar 

  11. Han H, Wang WY, Mao BH (2005) Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. International Conference on Intelligent Computing. Springer, Berlin, Heidelberg, pp 878–887

  12. He H, Bai Y, Garcia EA (2008) ADASYN: Adaptive Synthetic sampling approach for imbalanced learning. IEEE International Joint Conference on Neural Networks, pp 1322–1328

  13. Hu S, Liang Y, Ma L (2009) MSMOTE: Improving classification performance when training data is imbalanced. IEEE 2nd International Workshop on Computer Science and Engineering, pp 13–17

  14. Barua S, Islam MM, Yao X (2014) MWMOTE–Majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans Knowl Data Eng 26(2):405–425

    Article  Google Scholar 

  15. Zhou P, Hu X, Li P (2017) Online feature selection for high-dimensional class-imbalanced data. Knowl-Based Syst 136:187–199

    Article  Google Scholar 

  16. Wu G, Chang EY (2005) KBA: Kernel Boundary alignment considering imbalanced data distribution. IEEE Trans Knowl Data Eng 17(6):786–795

    Article  Google Scholar 

  17. Xu Y (2017) Maximum margin of twin spheres support vector machine for imbalanced data classification. IEEE Trans Cybern 47(6):1540–1550

    Article  Google Scholar 

  18. Xu Y, Wang Q, Pang X (2018) Maximum margin of twin spheres machine with pinball loss for imbalanced data classification. Appl Intell 48(1):23–34

    Article  Google Scholar 

  19. Wang S, Yao X (2009) Diversity analysis on imbalanced data sets by using ensemble models. IEEE Symposium on Computational Intelligence and Data Mining, pp 324–331

  20. Chawla NV, Lazarevic A, Hall LO (2003) SMOTEBOost: Improving prediction of the minority class in boosting. European conference on principles of data mining and knowledge discovery. Springer, Berlin, Heidelberg, pp 107–119

  21. Liu XY, Wu J, Zhou ZH (2009) Exploratory undersampling for class-imbalance learning. IEEE Trans Syst Man Cybern B (Cybernetics) 39(2):539–550

    Article  Google Scholar 

  22. Longadge R, Dongre S (2013) Class imbalance problem in data mining review. International Journal of Computer Science and Network 1305:1707

    Google Scholar 

  23. Zhang Z, Krawczyk B, Garcìa S (2016) Empowering one-vs-one decomposition with ensemble learning for multi-class imbalanced data. Knowl-Based Syst 106:251–263

    Article  Google Scholar 

  24. Cieslak DA, Chawla NV (2008) Learning decision trees for unbalanced data. Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, Berlin, Heidelberg, pp 241–256

  25. Cieslak DA, Hoens TR, Chawla NV (2012) Hellinger distance decision trees are robust and skew-insensitive. Data Min Knowl Disc 24(1):136–158

    Article  MathSciNet  MATH  Google Scholar 

  26. Hoens TR, Qian Q, Chawla NV (2012) Building decision trees for the multi-class imbalance problem. Pacific-asia Conference on Knowledge Discovery and Data Mining. Springer, Berlin, Heidelberg, pp 122–134

  27. Lyon RJ, Brooke JM, Knowles JD (2014) Hellinger distance trees for imbalanced streams. IEEE International Conference on Pattern Recognition, pp 1969–1974

  28. Chawla NV, Cieslak DA, Hall LO (2008) Automatically countering imbalance and its empirical relationship to cost. Data Min Knowl Disc 17(2):225–252

    Article  MathSciNet  Google Scholar 

  29. Zhang H (2012) Lazy decision tree method for distributed privacy preserving data mining. International Journal of Advancements in Computing Technology 4(14):458–465

    Article  Google Scholar 

  30. Quinlan JR (1996) Bagging, boosting, and c4.5. AAAI/IAAI 1:725–730

    Google Scholar 

  31. Dietterich TG (2000) An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Mach Learn 40(2):139–157

    Article  Google Scholar 

  32. Fern XZ, Brodley CE (2003) Boosting lazy decision trees. In: Proceedings of the 20th International Conference on Machine Learning ICML, pp 178–185

  33. Guillame-Bert M, Dubrawski A (2016) Batched Lazy Decision Trees. arXiv:1603.02578

  34. Rao CR (1995) A review of canonical coordinates and an alternative to correspondence analysis using Hellinger distance. Qű,estiió 19(1):23–63

    MathSciNet  MATH  Google Scholar 

  35. Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86

    Article  MathSciNet  MATH  Google Scholar 

  36. Christopher DM, Prabhakar R, Hinrich S (2008) Introduction to information retrieval. An Introduction To Information Retrieval 151(177):5

    MATH  Google Scholar 

  37. Triguero I, González S, Moyano JM (2017) KEEL 3.0: An open source software for multi-stage analysis in data mining. International Journal of Computational Intelligence Systems 10(1):1238–1249

    Article  Google Scholar 

  38. Chawla NV (2003) C4.5 and imbalanced data sets: investigating the effect of sampling method, probabilistic estimate, and decision tree structure. In: Proceedings of the ICML, 3:66

  39. Cortes C, Vapnik V (1995) Support vector machine. Mach Learn 20(3):273–297

    MATH  Google Scholar 

  40. Raeder T, Forman G, Chawla NV (2012) Learning from imbalanced data: evaluation matters. Data mining: Foundations and intelligent paradigms. Springer, Berlin, Heidelberg, pp 315–331

  41. Hand DJ, Till RJ (2001) A simple generalisation of the area under the ROC curve for multiple class classification problems. Mach Learn 45(2):171–186

    Article  MATH  Google Scholar 

  42. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(Jan):1–30

    MathSciNet  MATH  Google Scholar 

  43. García S, Herrera F (2008) An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. J Mach Learn Res 9(Dec):2677–2694

    MATH  Google Scholar 

  44. Van Den Bosch A, Weijters A, Van Den Herik HJ (1997) When small disjuncts abound, try lazy learning: A case study. Proceedings of the Seventh Belgian-Dutch Conference on Machine Learning, pp 109–118

  45. Caruana R, Niculescu-Mizil A (2006) An empirical comparison of supervised learning algorithms. ACM Proceedings of the 23rd international conference on Machine learning, pp 161–168

  46. Fernández-Delgado M, Cernadas E, Barro S (2014) Do we need hundreds of classifiers to solve real world classification problems. J Mach Learn Res 15(1):3133–3181

    MathSciNet  MATH  Google Scholar 

  47. Banfield RE, Hall LO, Bowyer KW (2007) A comparison of decision tree ensemble creation techniques. IEEE Trans Pattern Anal Mach Intell 29(1):173–180

  48. Zhou L, Fujita H (2017) Posterior probability based ensemble strategy using optimizing decision directed acyclic graph for multi-class classification. Inf Sci 400:142–156

    Article  Google Scholar 

Download references

Acknowledgements

We would like to acknowledge support for this project from China Postdoctoral Science Foundation (2016M600430), the National Social Science Foundation of China (16ZDA054), Jiangsu Provincial 333 Project (BRA2017396), Six Major Talents PeakProject of Jiangsu Province (XYDXXJS-CXTD-005) and Philosophy and social science in colleges and universities in Jiangsu Province outstanding innovation team (2015ZSTD006). The authors also would like to express our gratitude to the donors of the different data sets and the maintainers of the KEEL Data set Repository.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jie Cao.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Su, C., Cao, J. Improving lazy decision tree for imbalanced classification by using skew-insensitive criteria. Appl Intell 49, 1127–1145 (2019). https://doi.org/10.1007/s10489-018-1314-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-018-1314-z

Keywords

Navigation