A study on unstable cuts and its application to sample selection

Xing, Sheng; Ming, Zhong

doi:10.1007/s13042-017-0663-y

A study on unstable cuts and its application to sample selection

Original Article
Published: 07 April 2017

Volume 9, pages 1541–1552, (2018)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Sheng Xing^1,2 &
Zhong Ming³

149 Accesses
3 Citations
Explore all metrics

Abstract

An unstable cuts-based sample selection (UCBSS) is proposed. The proposed method addresses problems associated with traditional sample selection methods based on distance calculation when compressing large datasets, that is, significant time requirements and computational complexity. The core idea of the proposed method is that the extreme value of the convex function will be obtained at the boundary point. The proposed method measures the boundary extent of samples by marking unstable cuts, counting the number of unstable cuts and setting a threshold, and then obtains unstable subsets. Experimental results show that the proposed method is suitable for compression of large datasets with high imbalance ratio. Compared to the traditional condensed nearest neighbour (CNN) method, the proposed method can obtain similar compression ratios and higher G-mean values on datasets with high imbalance ratio. When the discriminant function of the classifier is a convex function, the proposed method can obtain similar accuracy and higher compression ratios on datasets with significant noise. In addition, the run time of the proposed method shows obvious advantage.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Wang XZ (2015) Learning from big data with uncertainty—editoria. J Intell Fuzzy Syst 28(5):2329–2330
Article Google Scholar
Wang XZ, Huang ZX (2015) Editorial: Uncertainty in learning from big data. Fuzzy Sets Syst 258: 1–4
Article MathSciNet MATH Google Scholar
Hart P (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory 14(5):515–516
Article Google Scholar
Gates GW (1972) The reduced nearest neighbor rule. IEEE Trans Theory 18(3):431–433
Article Google Scholar
Tomek I (1976) An experiment with the edited nearest-neighbor rule. IEEE Trans Syst Man Cybern SMC 6(6):448–452
MathSciNet MATH Google Scholar
Tomek I (1976) Two modifications of CNN. IEEE Trans Syst Man Commun 6:769–772
MathSciNet MATH Google Scholar
Wilson DR, Martinez TR (2000) Reduction techniques for instance-based learning algorithms. Machine Learning 38(3):257–286
Article MATH Google Scholar
Brighton B, Mellish C (2002) Advances in instance selection for instance-based learning algorithms. Data Min Knowl Disc 6(2):153–172
Article MathSciNet MATH Google Scholar
Ritter GL, Woodruff HB, Lowry SR et al (1975) An algorithm for a selective nearest neighbour decision rule. IEEE Trans Inf Theory 21(6):665–669
Article MATH Google Scholar
Dasarathy BV (1994) Minimal consistent set (MCS) identification for optimal nearest neighbor decision systems design. IEEE Trans Syst Man Cybern 24(1):511–517
Article Google Scholar
Fayed HA, Atiya AF (2009) A novel template reduction approach for the k-nearest neighbor method. IEEE Trans Neural Networks 20(5):890–896
Article Google Scholar
Nikolaidis K, Goulermas JY, Wu QH (2011) A class boundary preserving algorithm for data condensation. Pattern Recognit 44(3):704–715
Article MATH Google Scholar
Zhai JH, Li T, Wang XZ (2016) A cross-instance selection algorithm. J Intell Fuzzy Syst 30(2):717–728
Article Google Scholar
Chen JN, Zhang CM, Xue XP, Liu CL (2013) Fast instance selection for speeding up support vector machines. Knowl Based Syst 45:1–7
Article Google Scholar
Chou CH, Kuo BH, Chang F (2006) The generalized condensed nearest neighbor rule as a data reduction method. Proceedings of the 18th international conference on pattern recognition, Hong-Kong, 556–559
Li YH, Maguire L (2011) Selecting critical patterns based on local geometrical and statistical information. IEEE Trans Pattern Anal Mach Intell 33(6):1189–1201
Article Google Scholar
Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern SMC-2(3):408–421
Article MathSciNet MATH Google Scholar
Lowe DG (1995) Similarity Metric Learning for a Variable-Kernel Classifier. Neural Comput 7(1):72–85
Article Google Scholar
Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6:37–66
Google Scholar
Wilson DR, Martinez TR (1997) Instance pruning techniques. In: Fisher D (ed) Machine learning: Proceedings of the Fourteenth International Conference (ICML’97). Morgan Kaufmann Publishers, San Francisco, pp 403–411
Tsai CF, Chen ZY (2014) Towards high dimensional instance selection: an evolutionary approach. Decis Support Syst 61:79–92
Article Google Scholar
Tsai CF, Chang CW (2013) SVOIS: Support vector oriented instance selection for text classification. Inf Syst 38(8):1070–1083
Article Google Scholar
García-Osorio C, Haro-García AD, García-Pedrajas N (2010) Democratic instance selection: a linear complexity instance selection algorithm based on classifier ensemble concepts. Artif Intell 174:410–441
Article MathSciNet Google Scholar
Haro-García AD, García-Pedrajas N, Castillo JARD (2012) Large scale instance selection by means of federal instance selection. Data Knowl Eng 75:58–77
Article Google Scholar
Wang XZ, Dong LC, Yan JH (2012) Maximum ambiguity-based sample selection in fuzzy decision tree induction. IEEE Trans Knowl Data Eng 24(8):1491–1505
Article Google Scholar
Fu YF, Zhu XQ, Elmagarmid AK (2013) Active learning with optimal instance subset selection [J]. IEEE Trans Cybern 44(5):464–475
Google Scholar
Zhai TT, He ZF (2013) Instance selection for time series classification based on immune binary particle swarm optimization. Knowl-Based Syst 49:106–115
Article Google Scholar
Wang XZ, Xing S, Zhao SX (2016) Unstable cut-points based sample selection for large data classification 29(9):780–789
Lv J, Yi Z (2005) An improved backpropagation algorithm using absolute error function. Springer Berlin Heidelberg, 3496:585–590
Breiman L, Friedman JH, Stone CJ (1984) Classification and regression tree. Wadsworth International Group
Breiman L (1996) Technical note: Some properties of splitting criteria. Mach Learn 24:41–47
MATH Google Scholar
Rokach L, Maimon O (2005) Top-down induction of decision trees classifiers-a survey. IEEE Trans Syst Man Cybern Part C 35(4):476–488
Article Google Scholar
Quinlan JR (1986) Induction of decision tree. Machine Learning 1(1):81–106
Google Scholar
Quinlan JR (1996) Improved use of continuous attributes in C4.5. J Artif Intell Res 4:77–90
Article MATH Google Scholar
Fayyad UM, Irani KB (1992) On the handling of continuous-valued attributes in decision tree generation. Mach Learn 8:87–102
MATH Google Scholar
Fayyad UM, Irani KB (1993) Multi-interval discretization of continuous-valued attributes for classification learning. Mach Learn 1:1022–1027
Google Scholar
UCI Machine Learning Repository. http://archive.ics.uci.edu/ml/
Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1–3):489–501
Article Google Scholar
Wang XZ, Shao QY, Qing M, Zhai JH (2013) Architecture selection for networks trained with extreme learning machine using localized generalization error model. Neurocomputing 102: 3–9
Article Google Scholar
Wang XZ, Chen AX, Feng HM (2011) Upper integral network with extreme learning mechanism. Neurocomputing 74(16): 2520–2525
Article Google Scholar

Download references

Author information

Authors and Affiliations

College of Management, Hebei University, Baoding, 071002, Hebei, China
Sheng Xing
College of Computer Science and Engineering, Cangzhou Normal University, Cangzhou, 061001, Hebei, China
Sheng Xing
College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, 518060, Guangdong, China
Zhong Ming

Authors

Sheng Xing
View author publications
You can also search for this author in PubMed Google Scholar
Zhong Ming
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sheng Xing.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xing, S., Ming, Z. A study on unstable cuts and its application to sample selection. Int. J. Mach. Learn. & Cyber. 9, 1541–1552 (2018). https://doi.org/10.1007/s13042-017-0663-y

Download citation

Received: 14 February 2017
Accepted: 11 March 2017
Published: 07 April 2017
Issue Date: September 2018
DOI: https://doi.org/10.1007/s13042-017-0663-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A study on unstable cuts and its application to sample selection

Abstract

Access this article

Similar content being viewed by others

A review of unsupervised feature selection methods

Feature selection techniques for machine learning: a survey of more than two decades of research

A Comprehensive Survey of Anomaly Detection Algorithms

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A study on unstable cuts and its application to sample selection

Abstract

Access this article

Similar content being viewed by others

A review of unsupervised feature selection methods

Feature selection techniques for machine learning: a survey of more than two decades of research

A Comprehensive Survey of Anomaly Detection Algorithms

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation