Skip to main content
Log in

Binary classification for imbalanced datasets using twin hyperspheres based on conformal method

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Aiming at binary classification of highly imbalanced data, this paper proposes a novel twin-hypersphere method with conformal transformation. To provide favorable environments that the hyperspheres can search the region containing the majority class and pay more attention to the region containing the minority class, conformal mapping is put on the original data region. Meanwhile, to tighten classification boundaries learned from the hyperspheres, a gain operation is implemented on the kernels. Experimental results show that the accuracy of classification boundaries learned by the proposed method reaches 0.880 on the synthetic datasets. Results also show and our classification accuracy is 0.731 on the highly imbalanced dataset with imbalanced ratio 87.8:1, which defeated against the competitors with significant advantages. Moreover, time consumption of the proposed method did not exponentially increase so that it is suitable for the classification to a large-scale scenario. We find that non-linear kernels are better at focusing on global regions, while conformal transformation can assist them better perception sub-regions. Conformal transformation is helpful the observation of the regions containing those hard-to-observe minority classes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 1
Algorithm 2
Algorithm 3
Algorithm 4
Algorithm 5
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data availability

Data will be made available on request. The data is cited at http://archive.ics.uci.edu/ml/datasets/LED+Display+Domain, http://archive.ics.uci.edu/ml/datasets/Ecoli, http://archive.ics.uci.edu/ml/datasets/Yeast, http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/, http://archive.ics.uci.edu/ml/datasets/Poker+Hand

References

  1. Mallikarjuna, C., Sivanesan, S.: Question classification using limited labeled data. Inf. Process. Manage. 59(6), 1–15 (2022)

    Article  Google Scholar 

  2. Tang, Bo., He, H.: GIR-based ensemble sampling approaches for imbalanced learning. Pattern Recogn. 71, 306–319 (2017)

    Article  Google Scholar 

  3. Furundzic, D., Stankovic, S., Jovicic, S.T., Punišić, S., Subotić, M.: Distance based resampling of imbalanced classes: With an application example of speech quality assessment. Eng. Appl. Artif. Intell. 64, 440–461 (2017)

    Article  Google Scholar 

  4. Feng, L., Wang, H., Jinl, B.: Learning a distance Metric by balancing KL-divergence for imbalanced datasets. IEEE transaction on Systems, Man, and Cybernetics: Systems 49(12), 2384–2395 (2019)

    Article  Google Scholar 

  5. Ando, S., Huang, C.Y.: Deep over-sampling framework for classifying imbalanced data. Proc. Joint Eur. Conf. Mach. Learn. Knowl. Discovery Databases, 770–785 (2017).

  6. Dong, Q., Gong, S., Zhu, X.: Class rectification hard mining for imbalanced deep learning. Proc. IEEE Int. Conf. Comput. Vis., 1869–1878 (2017).

  7. Douglas, P., Fuller, C.M.: Expressing uncertainty in information systems analytics research: a demonstration of bayesian analysis applied to binary classification problems. Inf. Process. Manage. 60(1), 1–17 (2022)

    Google Scholar 

  8. Lin, N., Sihui, Fu., Lin, X., Wang, L.: Multi-label emotion classification based on adversarial multi-task learning. Inf. Process. Manage. 59(6), 1–20 (2022)

    Article  Google Scholar 

  9. Fernández, A., García, S., del José, M.J., Herrera, F.: A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets. Fuzzy Sets Syst. 159(18), 2378–2398 (2008)

    Article  MathSciNet  Google Scholar 

  10. Wang, N., Liang, R., Zhao, X.: Cost-sensitive hypergraph learning with F-measure optimization. IEEE Transactions on Cybernetics 3, 1–12 (2021)

    Google Scholar 

  11. Xiaoyuan, J., Xinyu, Z., Zhu Xiaoke, Wu., Fei, Y.X., Yang, G., Shiguang, S., JingYu, Y.: Multiset feature learning for highly imbalanced data classification. IEEE Trans. Pattern Anal. Mach. Intell. 43(1), 139–155 (2021)

    Article  Google Scholar 

  12. Triguero, I., Galar, M., Vluymans, S., Cornelis, C., Bustince, H.: Evolutionary undersampling for imbalanced big data classification. Proc. IEEE Congr. Evol. Comput., 715–722 (2015).

  13. Arkok, B., Zeki, A.M.: Classification of Quranic Topics Using SMOTE Technique. 2021 International Conference of Modern Trends in Information and Communication Technology Industry (MTICTI), IEEE, 1–4 (2021).

  14. Chawla, N.V., Bowyer, K.W., Hall, L.J., Philip Kegelmeyer, W.: SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 16, 321–357 (2002)

    Article  Google Scholar 

  15. Yitian, Xu., Zhiji, Y., Xianli, P.: A novel twin support-vector machine with pinball loss. IEEE transactions on neural networks and learning system 28(2), 359–370 (2017)

    Article  MathSciNet  Google Scholar 

  16. Datta, S., Das, S.: Near-Bayesian support vector machines for imbalanced data classification with equal or unequal misclassification costs. Neural Netw. 70, 39–52 (2015)

    Article  Google Scholar 

  17. Yitian, X.: Maximum margin of twin spheres support vector machine for imbalanced data classification. IEEE Transactions on Cybernetics 47(6), 1540–1550 (2017)

    Article  Google Scholar 

  18. Benshan, M., Yitian, X.: Multi-task least squares twin support vector machine for classification. Neurocomputing 338, 26–33 (2019)

    Article  Google Scholar 

  19. Şalk, Y., Uzun B., Çevikalp H., Sarıbaş, H.: Anomaly Detection with Deep Compact Hypersphere. 30th Signal Processing and Communications Applications Conference (SIU), IEEE, 1–4 (2022).

  20. Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: A Unified embedding for face recognition and clustering. Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 815–823 (2015).

  21. Yueqi, D., Lei, C., Jiwen, L., Jie, Z.: Deep embedding learning with discriminative sampling policy. Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 4964–4973 (2019).

  22. Hu Junlin, Lu., Jiwen, T.Y.: Sharable and individual multi-view metric learning. IEEE Trans. Pattern Anal. Mach. Intell.Pattern Anal. Mach. Intell. 40(9), 2281–2288 (2018)

    Article  Google Scholar 

  23. Gao, H., Huang, W., Duan, Y.: Research on cost-driven services composition in an uncertain environment. J. Internet Technol. 20(3), 755–769 (2019)

    Google Scholar 

  24. Oğuz, Ç., Yağanoğlu, M.: Detection of COVID-19 using deep learning techniques and classification methods. Inf. Process. Manage. 59(5), 1–12 (2022)

    Article  Google Scholar 

  25. Yintao, Y., Rui, M., Yili, W., Xin, W.: Contrastive Graph Convolutional Networks with adaptive augmentation for text classification. Inf. Process. Manage. 59(4), 1–16 (2022)

    Google Scholar 

  26. Tabassum, N., Menon, S., Jastrzębska, A.: Time-series classification with SAFE: simple and fast segmented word embedding-based neural time series classifier. Inf. Process. Manage. 59(5), 1–17 (2022)

    Article  Google Scholar 

  27. Chen Haihua, Wu., Lei, C.J., Wei, Lu., Junhua, D.: A comparative study of automated legal text classification using random forests and deep learning. Inf. Process. Manage. 59(2), 1–18 (2022)

    Google Scholar 

  28. Muñoz, S., Iglesias, C.A.: A text classification approach to detect psychological stress combining a lexicon-based feature framework with distributional representations. Inf. Process. Manage. 59(5), 1–13 (2022)

    Article  Google Scholar 

  29. Bzdok, D., Krzywinski, M., Altman, N.: Machine learning: supervised methods. Nat. Methods 15(1), 5–6 (2018)

    Article  Google Scholar 

  30. Xianfeng, Gu., Shingtung, Y.: Computational Conformal Geometry. Springer (2020)

    Google Scholar 

  31. Chunxiuzi, L., Fengyang, S., Qingrui, N.: A Novel Graphic Bending Transformation on Benchmark. 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC), 11–14 (2020).

  32. Guoxu, F., Jun, H., Hongbo, S.: A New Ray Tracing Method Based on Piecewise Conformal Transformations. IEEE Transactions on Microwave Theory and Techniques, 1–1 (2022).

  33. Burges, C.J.C.: Geometry and invariance in kernel based methods [M]. Springer (1999)

    Google Scholar 

  34. Xinjun, P., Dong, X.: A twin-hypersphere support vector machine classifier and the fast learning algorithm. Information Science 221, 12–27 (2013)

    Article  MathSciNet  Google Scholar 

  35. Demsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MathSciNet  Google Scholar 

  36. Khan, S., Hayat, M., Zamir, W., Shen, J., Shao, L.: Striking the right balance with uncertainty. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 103–112 (2019).

  37. Chengjian, F., Yujie, Z., Huang, W.: Exploring classification equilibrium in long-tailed object detection. In International Conference on Computer Vision, 3417–3426 (2021).

  38. Zongyong, D., Hao, L., Yaoxing, W., Chenyang, W., Zekuan, Y.: PML: Progressive margin loss for long-tailed age classification. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10503–10512 (2021).

  39. Wang, P., Han, K., Wei, X., Zhang, L., Wang, L.: Contrastive learning based hybrid networks for long-tailed image classification. Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 943–952 (2021).

  40. Cui, Y., Jia, M., Lin, T., Song, Y., Belongie, S.: Class-balanced loss based on effective number of samples. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9268–9277 (2019).

  41. Jamal, M., Brown, M.A., Yang, M.-H., Wang, L., Gong, B.: Rethinking class-balanced methods for long-tailed visual recognition from a domain adaptation perspective. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7610–7619 (2020).

  42. Jialian, Wu., Liangchen, S., Qian, Z.: ForestDet: large-vocabulary long-tailed object detection and instance segmentation. IEEE Trans. Multimedia 24, 3693–3705 (2022)

    Article  Google Scholar 

  43. Xingquan, Z., Xindong, W.: Class noise vs. attribute noise: a quantitative study of their impacts. Artif. Intell. Rev. 22, 177–210 (2004)

    Article  Google Scholar 

Download references

Funding

The research funding is supported by the Science and Technology Research Program of Chongqing Municipal Education Commission of China under Grant KJQN202300801. And the Natural Science Foundation of Chongqing under Grant cstc2021jcyj-msxmX0518. And the National Social Science Fund Project of China under Grant 19BTJ027.

Author information

Authors and Affiliations

Authors

Contributions

Jian Zheng proposed and wrote the manuscript. Lin Li and Shiyan Wang performed source code. Jian Zheng and Huyong Yan analyzed the experimental results.

Corresponding author

Correspondence to Jian Zheng.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zheng, J., Li, L., Wang, S. et al. Binary classification for imbalanced datasets using twin hyperspheres based on conformal method. Cluster Comput (2024). https://doi.org/10.1007/s10586-024-04528-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10586-024-04528-x

Keywords

Navigation