Binary classification for imbalanced datasets using twin hyperspheres based on conformal method

Zheng, Jian; Li, Lin; Wang, Shiyan; Yan, Huyong

doi:10.1007/s10586-024-04528-x

Binary classification for imbalanced datasets using twin hyperspheres based on conformal method

Published: 22 May 2024

(2024)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Jian Zheng¹,
Lin Li²,
Shiyan Wang³ &
…
Huyong Yan¹

45 Accesses
Explore all metrics

Abstract

Aiming at binary classification of highly imbalanced data, this paper proposes a novel twin-hypersphere method with conformal transformation. To provide favorable environments that the hyperspheres can search the region containing the majority class and pay more attention to the region containing the minority class, conformal mapping is put on the original data region. Meanwhile, to tighten classification boundaries learned from the hyperspheres, a gain operation is implemented on the kernels. Experimental results show that the accuracy of classification boundaries learned by the proposed method reaches 0.880 on the synthetic datasets. Results also show and our classification accuracy is 0.731 on the highly imbalanced dataset with imbalanced ratio 87.8:1, which defeated against the competitors with significant advantages. Moreover, time consumption of the proposed method did not exponentially increase so that it is suitable for the classification to a large-scale scenario. We find that non-linear kernels are better at focusing on global regions, while conformal transformation can assist them better perception sub-regions. Conformal transformation is helpful the observation of the regions containing those hard-to-observe minority classes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation

Article Open access 02 January 2020

Learning from imbalanced data: open challenges and future directions

Article Open access 22 April 2016

A review of unsupervised feature selection methods

Article 29 January 2019

Data availability

Data will be made available on request. The data is cited at http://archive.ics.uci.edu/ml/datasets/LED+Display+Domain, http://archive.ics.uci.edu/ml/datasets/Ecoli, http://archive.ics.uci.edu/ml/datasets/Yeast, http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/, http://archive.ics.uci.edu/ml/datasets/Poker+Hand

References

Mallikarjuna, C., Sivanesan, S.: Question classification using limited labeled data. Inf. Process. Manage. 59(6), 1–15 (2022)
Article Google Scholar
Tang, Bo., He, H.: GIR-based ensemble sampling approaches for imbalanced learning. Pattern Recogn. 71, 306–319 (2017)
Article Google Scholar
Furundzic, D., Stankovic, S., Jovicic, S.T., Punišić, S., Subotić, M.: Distance based resampling of imbalanced classes: With an application example of speech quality assessment. Eng. Appl. Artif. Intell. 64, 440–461 (2017)
Article Google Scholar
Feng, L., Wang, H., Jinl, B.: Learning a distance Metric by balancing KL-divergence for imbalanced datasets. IEEE transaction on Systems, Man, and Cybernetics: Systems 49(12), 2384–2395 (2019)
Article Google Scholar
Ando, S., Huang, C.Y.: Deep over-sampling framework for classifying imbalanced data. Proc. Joint Eur. Conf. Mach. Learn. Knowl. Discovery Databases, 770–785 (2017).
Dong, Q., Gong, S., Zhu, X.: Class rectification hard mining for imbalanced deep learning. Proc. IEEE Int. Conf. Comput. Vis., 1869–1878 (2017).
Douglas, P., Fuller, C.M.: Expressing uncertainty in information systems analytics research: a demonstration of bayesian analysis applied to binary classification problems. Inf. Process. Manage. 60(1), 1–17 (2022)
Google Scholar
Lin, N., Sihui, Fu., Lin, X., Wang, L.: Multi-label emotion classification based on adversarial multi-task learning. Inf. Process. Manage. 59(6), 1–20 (2022)
Article Google Scholar
Fernández, A., García, S., del José, M.J., Herrera, F.: A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets. Fuzzy Sets Syst. 159(18), 2378–2398 (2008)
Article MathSciNet Google Scholar
Wang, N., Liang, R., Zhao, X.: Cost-sensitive hypergraph learning with F-measure optimization. IEEE Transactions on Cybernetics 3, 1–12 (2021)
Google Scholar
Xiaoyuan, J., Xinyu, Z., Zhu Xiaoke, Wu., Fei, Y.X., Yang, G., Shiguang, S., JingYu, Y.: Multiset feature learning for highly imbalanced data classification. IEEE Trans. Pattern Anal. Mach. Intell. 43(1), 139–155 (2021)
Article Google Scholar
Triguero, I., Galar, M., Vluymans, S., Cornelis, C., Bustince, H.: Evolutionary undersampling for imbalanced big data classification. Proc. IEEE Congr. Evol. Comput., 715–722 (2015).
Arkok, B., Zeki, A.M.: Classification of Quranic Topics Using SMOTE Technique. 2021 International Conference of Modern Trends in Information and Communication Technology Industry (MTICTI), IEEE, 1–4 (2021).
Chawla, N.V., Bowyer, K.W., Hall, L.J., Philip Kegelmeyer, W.: SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 16, 321–357 (2002)
Article Google Scholar
Yitian, Xu., Zhiji, Y., Xianli, P.: A novel twin support-vector machine with pinball loss. IEEE transactions on neural networks and learning system 28(2), 359–370 (2017)
Article MathSciNet Google Scholar
Datta, S., Das, S.: Near-Bayesian support vector machines for imbalanced data classification with equal or unequal misclassification costs. Neural Netw. 70, 39–52 (2015)
Article Google Scholar
Yitian, X.: Maximum margin of twin spheres support vector machine for imbalanced data classification. IEEE Transactions on Cybernetics 47(6), 1540–1550 (2017)
Article Google Scholar
Benshan, M., Yitian, X.: Multi-task least squares twin support vector machine for classification. Neurocomputing 338, 26–33 (2019)
Article Google Scholar
Şalk, Y., Uzun B., Çevikalp H., Sarıbaş, H.: Anomaly Detection with Deep Compact Hypersphere. 30th Signal Processing and Communications Applications Conference (SIU), IEEE, 1–4 (2022).
Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: A Unified embedding for face recognition and clustering. Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 815–823 (2015).
Yueqi, D., Lei, C., Jiwen, L., Jie, Z.: Deep embedding learning with discriminative sampling policy. Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 4964–4973 (2019).
Hu Junlin, Lu., Jiwen, T.Y.: Sharable and individual multi-view metric learning. IEEE Trans. Pattern Anal. Mach. Intell.Pattern Anal. Mach. Intell. 40(9), 2281–2288 (2018)
Article Google Scholar
Gao, H., Huang, W., Duan, Y.: Research on cost-driven services composition in an uncertain environment. J. Internet Technol. 20(3), 755–769 (2019)
Google Scholar
Oğuz, Ç., Yağanoğlu, M.: Detection of COVID-19 using deep learning techniques and classification methods. Inf. Process. Manage. 59(5), 1–12 (2022)
Article Google Scholar
Yintao, Y., Rui, M., Yili, W., Xin, W.: Contrastive Graph Convolutional Networks with adaptive augmentation for text classification. Inf. Process. Manage. 59(4), 1–16 (2022)
Google Scholar
Tabassum, N., Menon, S., Jastrzębska, A.: Time-series classification with SAFE: simple and fast segmented word embedding-based neural time series classifier. Inf. Process. Manage. 59(5), 1–17 (2022)
Article Google Scholar
Chen Haihua, Wu., Lei, C.J., Wei, Lu., Junhua, D.: A comparative study of automated legal text classification using random forests and deep learning. Inf. Process. Manage. 59(2), 1–18 (2022)
Google Scholar
Muñoz, S., Iglesias, C.A.: A text classification approach to detect psychological stress combining a lexicon-based feature framework with distributional representations. Inf. Process. Manage. 59(5), 1–13 (2022)
Article Google Scholar
Bzdok, D., Krzywinski, M., Altman, N.: Machine learning: supervised methods. Nat. Methods 15(1), 5–6 (2018)
Article Google Scholar
Xianfeng, Gu., Shingtung, Y.: Computational Conformal Geometry. Springer (2020)
Google Scholar
Chunxiuzi, L., Fengyang, S., Qingrui, N.: A Novel Graphic Bending Transformation on Benchmark. 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC), 11–14 (2020).
Guoxu, F., Jun, H., Hongbo, S.: A New Ray Tracing Method Based on Piecewise Conformal Transformations. IEEE Transactions on Microwave Theory and Techniques, 1–1 (2022).
Burges, C.J.C.: Geometry and invariance in kernel based methods [M]. Springer (1999)
Google Scholar
Xinjun, P., Dong, X.: A twin-hypersphere support vector machine classifier and the fast learning algorithm. Information Science 221, 12–27 (2013)
Article MathSciNet Google Scholar
Demsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MathSciNet Google Scholar
Khan, S., Hayat, M., Zamir, W., Shen, J., Shao, L.: Striking the right balance with uncertainty. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 103–112 (2019).
Chengjian, F., Yujie, Z., Huang, W.: Exploring classification equilibrium in long-tailed object detection. In International Conference on Computer Vision, 3417–3426 (2021).
Zongyong, D., Hao, L., Yaoxing, W., Chenyang, W., Zekuan, Y.: PML: Progressive margin loss for long-tailed age classification. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10503–10512 (2021).
Wang, P., Han, K., Wei, X., Zhang, L., Wang, L.: Contrastive learning based hybrid networks for long-tailed image classification. Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 943–952 (2021).
Cui, Y., Jia, M., Lin, T., Song, Y., Belongie, S.: Class-balanced loss based on effective number of samples. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9268–9277 (2019).
Jamal, M., Brown, M.A., Yang, M.-H., Wang, L., Gong, B.: Rethinking class-balanced methods for long-tailed visual recognition from a domain adaptation perspective. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7610–7619 (2020).
Jialian, Wu., Liangchen, S., Qian, Z.: ForestDet: large-vocabulary long-tailed object detection and instance segmentation. IEEE Trans. Multimedia 24, 3693–3705 (2022)
Article Google Scholar
Xingquan, Z., Xindong, W.: Class noise vs. attribute noise: a quantitative study of their impacts. Artif. Intell. Rev. 22, 177–210 (2004)
Article Google Scholar

Download references

Funding

The research funding is supported by the Science and Technology Research Program of Chongqing Municipal Education Commission of China under Grant KJQN202300801. And the Natural Science Foundation of Chongqing under Grant cstc2021jcyj-msxmX0518. And the National Social Science Fund Project of China under Grant 19BTJ027.

Author information

Authors and Affiliations

School of Artificial Intelligence, Chongqing Technology and Business University, Chongqing, 400067, China
Jian Zheng & Huyong Yan
College of Mathematics and Statistics, Chongqing Three Gorges University, Chongqing, 404199, China
Lin Li
School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
Shiyan Wang

Authors

Jian Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Lin Li
View author publications
You can also search for this author in PubMed Google Scholar
Shiyan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Huyong Yan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Jian Zheng proposed and wrote the manuscript. Lin Li and Shiyan Wang performed source code. Jian Zheng and Huyong Yan analyzed the experimental results.

Corresponding author

Correspondence to Jian Zheng.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zheng, J., Li, L., Wang, S. et al. Binary classification for imbalanced datasets using twin hyperspheres based on conformal method. Cluster Comput (2024). https://doi.org/10.1007/s10586-024-04528-x

Download citation

Received: 20 February 2024
Revised: 22 April 2024
Accepted: 22 April 2024
Published: 22 May 2024
DOI: https://doi.org/10.1007/s10586-024-04528-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Binary classification for imbalanced datasets using twin hyperspheres based on conformal method

Abstract

Access this article

Similar content being viewed by others

The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation

Learning from imbalanced data: open challenges and future directions

A review of unsupervised feature selection methods

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Binary classification for imbalanced datasets using twin hyperspheres based on conformal method

Abstract

Access this article

Similar content being viewed by others

The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation

Learning from imbalanced data: open challenges and future directions

A review of unsupervised feature selection methods

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation