Abstract
Lazy decision tree (LazyDT) constructs a customized decision tree for each test instance, which consists of only a single path from the root to a leaf node. LazyDT has two strengths in comparison with eager decision trees. One is that LazyDT can build shorter decision paths than eager decision trees, and the other is that LazyDT can avoid unnecessary data fragmentation. However, the split criterion used for constructing a customized tree in LazyDT is information gain, which is skew-sensitive. When learning from imbalanced data sets, class imbalance impedes their ability to learn the minority class concept. In this paper, we use Hellinger distance and K-L divergence as split criteria to build two types of lazy decision trees. An experimental framework is performed across a wide range of imbalanced data sets to investigate the effectiveness of our methods when comparing with the other methods including lazy decision tree, C4.5, Hellinger distance based decision tree and support vector machine. In addition, we also use SMOTE to preprocess the highly imbalance data sets in the experiment and evaluate its effectiveness. The experimental results, which contrasted through nonparametric statistical tests, demonstrate that using Hellinger distance and K-L divergence as the split criterion can improve the performances of LazyDT for imbalanced classification effectively.
Similar content being viewed by others
References
Quinlan JR (2014) C4.5: Programs for machine learning. Elsevier, Amsterdam
Friedman JH, Kohavi R, Yun Y (1996) Lazy decision trees. AAAI/IAAI 1:717–724
Bagallo G, Haussler D (1990) Boolean feature discovery in empirical learning. Mach Learn 5(1):71–99
Mahmoudi N, Duman E (2015) Detecting credit card fraud by modified fisher discriminant analysis. Expert Syst Appl 42(5):2510–2516
Khor KC, Ting CY, Phon-Amnuaisuk S (2014) The effectiveness of sampling methods for the imbalanced network intrusion detection data set. Recent Advances on Soft Computing and Data Mining. Springer, Cham, pp 613–622
Wan X, Liu J, Cheung WK (2014) Learning to improve medical decision making from imbalanced data without a priori cost. BMC medical informatics and decision making 14(1):111
He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 12(9):1263–1284
López V, Fernández A, García S (2014) An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Inf Sci 250:113–141
Krawczyk B (2016) Learning from imbalanced data: Open challenges and future directions. Progress in Artificial Intelligence 5(4):221–232
Chawla NV, Bowyer KW, Hall LO (2002) SMOTE: Synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Han H, Wang WY, Mao BH (2005) Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. International Conference on Intelligent Computing. Springer, Berlin, Heidelberg, pp 878–887
He H, Bai Y, Garcia EA (2008) ADASYN: Adaptive Synthetic sampling approach for imbalanced learning. IEEE International Joint Conference on Neural Networks, pp 1322–1328
Hu S, Liang Y, Ma L (2009) MSMOTE: Improving classification performance when training data is imbalanced. IEEE 2nd International Workshop on Computer Science and Engineering, pp 13–17
Barua S, Islam MM, Yao X (2014) MWMOTE–Majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans Knowl Data Eng 26(2):405–425
Zhou P, Hu X, Li P (2017) Online feature selection for high-dimensional class-imbalanced data. Knowl-Based Syst 136:187–199
Wu G, Chang EY (2005) KBA: Kernel Boundary alignment considering imbalanced data distribution. IEEE Trans Knowl Data Eng 17(6):786–795
Xu Y (2017) Maximum margin of twin spheres support vector machine for imbalanced data classification. IEEE Trans Cybern 47(6):1540–1550
Xu Y, Wang Q, Pang X (2018) Maximum margin of twin spheres machine with pinball loss for imbalanced data classification. Appl Intell 48(1):23–34
Wang S, Yao X (2009) Diversity analysis on imbalanced data sets by using ensemble models. IEEE Symposium on Computational Intelligence and Data Mining, pp 324–331
Chawla NV, Lazarevic A, Hall LO (2003) SMOTEBOost: Improving prediction of the minority class in boosting. European conference on principles of data mining and knowledge discovery. Springer, Berlin, Heidelberg, pp 107–119
Liu XY, Wu J, Zhou ZH (2009) Exploratory undersampling for class-imbalance learning. IEEE Trans Syst Man Cybern B (Cybernetics) 39(2):539–550
Longadge R, Dongre S (2013) Class imbalance problem in data mining review. International Journal of Computer Science and Network 1305:1707
Zhang Z, Krawczyk B, Garcìa S (2016) Empowering one-vs-one decomposition with ensemble learning for multi-class imbalanced data. Knowl-Based Syst 106:251–263
Cieslak DA, Chawla NV (2008) Learning decision trees for unbalanced data. Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, Berlin, Heidelberg, pp 241–256
Cieslak DA, Hoens TR, Chawla NV (2012) Hellinger distance decision trees are robust and skew-insensitive. Data Min Knowl Disc 24(1):136–158
Hoens TR, Qian Q, Chawla NV (2012) Building decision trees for the multi-class imbalance problem. Pacific-asia Conference on Knowledge Discovery and Data Mining. Springer, Berlin, Heidelberg, pp 122–134
Lyon RJ, Brooke JM, Knowles JD (2014) Hellinger distance trees for imbalanced streams. IEEE International Conference on Pattern Recognition, pp 1969–1974
Chawla NV, Cieslak DA, Hall LO (2008) Automatically countering imbalance and its empirical relationship to cost. Data Min Knowl Disc 17(2):225–252
Zhang H (2012) Lazy decision tree method for distributed privacy preserving data mining. International Journal of Advancements in Computing Technology 4(14):458–465
Quinlan JR (1996) Bagging, boosting, and c4.5. AAAI/IAAI 1:725–730
Dietterich TG (2000) An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Mach Learn 40(2):139–157
Fern XZ, Brodley CE (2003) Boosting lazy decision trees. In: Proceedings of the 20th International Conference on Machine Learning ICML, pp 178–185
Guillame-Bert M, Dubrawski A (2016) Batched Lazy Decision Trees. arXiv:1603.02578
Rao CR (1995) A review of canonical coordinates and an alternative to correspondence analysis using Hellinger distance. Qű,estiió 19(1):23–63
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86
Christopher DM, Prabhakar R, Hinrich S (2008) Introduction to information retrieval. An Introduction To Information Retrieval 151(177):5
Triguero I, González S, Moyano JM (2017) KEEL 3.0: An open source software for multi-stage analysis in data mining. International Journal of Computational Intelligence Systems 10(1):1238–1249
Chawla NV (2003) C4.5 and imbalanced data sets: investigating the effect of sampling method, probabilistic estimate, and decision tree structure. In: Proceedings of the ICML, 3:66
Cortes C, Vapnik V (1995) Support vector machine. Mach Learn 20(3):273–297
Raeder T, Forman G, Chawla NV (2012) Learning from imbalanced data: evaluation matters. Data mining: Foundations and intelligent paradigms. Springer, Berlin, Heidelberg, pp 315–331
Hand DJ, Till RJ (2001) A simple generalisation of the area under the ROC curve for multiple class classification problems. Mach Learn 45(2):171–186
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(Jan):1–30
García S, Herrera F (2008) An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. J Mach Learn Res 9(Dec):2677–2694
Van Den Bosch A, Weijters A, Van Den Herik HJ (1997) When small disjuncts abound, try lazy learning: A case study. Proceedings of the Seventh Belgian-Dutch Conference on Machine Learning, pp 109–118
Caruana R, Niculescu-Mizil A (2006) An empirical comparison of supervised learning algorithms. ACM Proceedings of the 23rd international conference on Machine learning, pp 161–168
Fernández-Delgado M, Cernadas E, Barro S (2014) Do we need hundreds of classifiers to solve real world classification problems. J Mach Learn Res 15(1):3133–3181
Banfield RE, Hall LO, Bowyer KW (2007) A comparison of decision tree ensemble creation techniques. IEEE Trans Pattern Anal Mach Intell 29(1):173–180
Zhou L, Fujita H (2017) Posterior probability based ensemble strategy using optimizing decision directed acyclic graph for multi-class classification. Inf Sci 400:142–156
Acknowledgements
We would like to acknowledge support for this project from China Postdoctoral Science Foundation (2016M600430), the National Social Science Foundation of China (16ZDA054), Jiangsu Provincial 333 Project (BRA2017396), Six Major Talents PeakProject of Jiangsu Province (XYDXXJS-CXTD-005) and Philosophy and social science in colleges and universities in Jiangsu Province outstanding innovation team (2015ZSTD006). The authors also would like to express our gratitude to the donors of the different data sets and the maintainers of the KEEL Data set Repository.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Su, C., Cao, J. Improving lazy decision tree for imbalanced classification by using skew-insensitive criteria. Appl Intell 49, 1127–1145 (2019). https://doi.org/10.1007/s10489-018-1314-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-018-1314-z