Abstract
The asymmetry of different misclassification costs is a common problem in many realistic applications. However, most of the traditional classifiers pursue high recognition accuracy, assuming that different misclassification errors bring uniform cost. This paper proposes two cost-sensitive models based on support vector data description (SVDD) to minimize classification costs while maximize classification accuracy. The one-class classifier SVDD is extended to two two-class models. The cost information is incorporated to pursue tradeoff generalization performances between different classes in order to minimize the misclassification costs. Cost information is also considered to build the decision rules. The solutions of the optimization problems of the proposed two models are formulated according to sequential minimal optimization (SMO) algorithm. However, SMO needs to check all the samples to select the working set in each iteration, which is very time consuming. Considering that only the support vectors are needed to describe the boundaries, a sample selection approach is proposed to speed up the training time and reduce the storage requirement by selecting edge and overlapping samples, and overcome the local overlearning by remove outliers. Experimental results on synthetic and public datasets demonstrate the effectiveness and efficiency of the proposed methods.
Similar content being viewed by others
References
Elkan C (2001) The foundations of cost-sensitive learning[C]. In: Proceedings of 17th international joint conference on artificial intelligence, pp 973–978
Datta S, Das S (2015) Near-bayesian support vector machines for imbalanced data classification with equal or unequal misclassification costs[J]. Neural Netw 70:39–52
Dhar S, Cherkassky V (2015) Development and evaluation of cost-sensitive universum-SVM[J]. IEEE Trans Cybern 45(4):806–818
Yang CY, Yang JS, Wang JJ (2009) Margin calibration in SVM class-imbalanced learning[J]. Neurocomputing 73(1-3):397–411
Jiang L et al (2014) Cost-sensitive Bayesian network classifiers[J]. Pattern Recogn Lett 45:211–216
Ibáñez A, Bielza C, Larrañaga P (2014) Cost-sensitive selective naïve Bayes classifiers for predicting the increase of the h-index for scientific journals[J]. Neurocomputing 135:42–52
Freitas A, Costa-Pereira A, Brazdil P (2007) Cost-sensitive decision trees applied to medical data[J]. In: Data warehousing and knowledge discovery. Springer, Berlin, pp 303–312
Li X, Zhao H, Zhu W (2015) A cost sensitive decision tree algorithm with two adaptive mechanisms[J]. Knowl-Based Syst 88:24–33
Chen YL, Wub CC, Tang K (2016) Time-constrained cost-sensitive decision tree induction[J]. Inf Sci 354:140–152
Sun Y et al (2007) Cost-sensitive boosting for classification of imbalanced data[J]. Pattern Recogn 40:3358–3378
Wang B, Pineau J (2016) Online bagging and boosting for imbalanced data streams[J]. IEEE Trans Knowl Data Eng 28(12):3353–3366
Zhao H, Min F, Zhu W (2011) Test-cost-sensitive attribute reduction based on neighborhood rough set[C]. In: IEEE international conference on granular computing, pp 802–806
Jia X, Liao W et al (2013) Minimum cost attribute reduction in decision-theoretic rough set models[J]. Inf Sci 219:151–167
Shu W, Shen H (2016) Multi-criteria feature selection on costsensitive data with missing values[J]. Pattern Recogn 51:268–280
Ju H, Yang X, Yu H et al (2016) Cost-sensitive rough set approach[J]. Inf Sci 355–356:282–298
Tax DMJ, Duin RPW (1999) Support vector domain description[J]. Pattern Recogn Lett 20:1191–1199
Tax DMJ, Duin RPW (2004) Support vector domain description[J]. Mach Learn 54:45–66
Lee D, Lee J (2007) Domain described support vector classifier for multi-classification problems[J]. Pattern Recogn 40(1):41—51
Mu TT, Nandi AK (2009) Multiclass classification based on extended support vector data description[J]. IEEE Trans Syst Man Cybern B Cybern 39(5):1206–1216
Guo Y, Xiao H, Fu Q (2017) Least square support vector data description for HRRP-based radar target recognition[J]. Appl Intell 46:365–372
Huang G, Chen H et al (2011) Two-class support vector data description[J]. Pattern Recogn 44:320–329
Azami M E, Lartizien C, Canu S (2017) Converting SVDD scores into probability estimates: Application to outlier detection[J]. Neurocomputing 268. https://doi.org/10.1016/j.neucom.2017.01.103
Wang S, Jianbo Y et al (2013) A modified support vector data description based novelty detection approach for machinery components[J]. Appl Soft Comput 13:1193–1205
Duan L, Xie M et al (2016) A new support vector data description method for machinery fault diagnosis with unbalanced datasets[J]. Expert Syst Appl 64:239–246
Zhou Y, Kan W et al (2017) Fault detection of aircraft based on support vector domain description[J]. Comput Electr Eng 61:80–94
Zhu K, Mei F, Zheng J (2017) Adaptive fault diagnosis of HVCBs based on P-SVDD and P-KFCM[J]. Neurocomputing 240:127–136
Krawczyk B, Woźniak M et al (2015) On the usefulness of one-class classifier ensembles for decomposition of multi-class problems[J]. Pattern Recogn 48:3969–3982
Maloof MA (2003) Learning when data sets are imbalanced and when costs are unequal and unknown[C]. In: Working notes of the ICML’OS workshop on learning from imbalanced data sets. Washington, DC
Kulluk S, Özbakır L, Tapkan PZ, Baykasoglu A (2016) Cost-sensitive meta-learning classifiers: MEPAR-miner and DIFACONN-miner[J]. Knowl-Based Syst 98:148–161
Zhou Z, Liu X (2010) On multi-class cost-sensitive learning[J]. Comput Intell 26(3):232–257
Metacost DP (1999) A general method for making classifiers cost-sensitive[C]. In: Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, San Diego, pp 155–164
Kim YJ, Baik B, Cho S (2016) Detecting financial misstatements with fraud intention using multi-class cost-sensitive learning[J]. Expert Syst Appl 62:32–43
Sheng VS, Ling CX (2006) Thresholding for making classifiers cost-sensitive[C]. In: Proceedings of the 21st national conference on artificial intelligence. Massachusetts, Boston, pp 476–481
Zhou H (2008) Instance weighting versus threshold adjusting for cost-sensitive classification[J]. Knowl Inf Syst 15:321– 334
Chatelain C, Adam S et al (2010) A multi-model selection framework for unknown and/or evolutive misclassifiecation cost problems [J]. Pattern Recogn 43:815–823
Bernard S, Chatelain C et al (2016) The multiclass ROC Front method for cost-sensitive classification[J]. Pattern Recogn 52:46–60
Tapkana P, Özbakıra L, Kulluka S, Baykasoglu A (2016) A cost-sensitive classification algorithm: BEE-Miner[J]. Knowl-Based Syst 95(C):99–113
Cheng F, Zhang J, Wen C (2016) Cost-sensitive large margin distribution machine for classification of imbalanced data[J]. Pattern Recogn Lett 80:107–112
Zhang GQ, Sun HJ et al (2016) Cost-sensitive dictionary learning for face recognition[J]. Pattern Recogn 60:613–629
Piatt J (1998) Sequential minimal optimization: A fast algorithm for training support vector machines[R]. Technical Report MST-TR-98-14, Microsoft Research
Flake GW, Lawrence S (2002) Efficient SVM regression training with SMO[J]. Mach Learn 46(1):271–290
Li YH, Maguire L (2011) Selecting critical patterns based on local geometrical and statistical information. IEEE Trans Pattern Anal Mach Intell 33(6):1189–1201
Xiao Y, Wang H, Xu W (2015) Parameter selection of Gaussian kernel for one-class SVM[J]. IEEE Trans Cybern 45(5):927
Chen Z, Xiao X, Li C et al (2016) Real-time transient stability status prediction using cost-sensitive extreme learning machine[J]. Neural Comput Appl 27:321–331
Nikolaou N, Edakunni N, Kull M et al (2016) Cost-sensitive boosting algorithms: Do we really need them?[J]. Mach Learn 104:359–384
Sun J, Lang J, Fujita H, Li H (2018) Imbalanced enterprise credit evaluation with DTE-SBD: Decision tree ensemble based on SMOTE and bagging with differentiated sampling rates[J]. Inf Sci 425:76–91
Alcalá-Fdez J, Fernández A, Luengo J, Derrac J et al (2011) KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework[J]. J Multiple-Valued Logic Soft Comput 17:255–287
Acknowledgments
This work is supported by the National Natural Science Foundation of China under Grants 60975026 and 61273275.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhao, Z., Wang, X. Cost-sensitive SVDD models based on a sample selection approach. Appl Intell 48, 4247–4266 (2018). https://doi.org/10.1007/s10489-018-1187-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-018-1187-1