Abstract
Aiming at binary classification of highly imbalanced data, this paper proposes a novel twin-hypersphere method with conformal transformation. To provide favorable environments that the hyperspheres can search the region containing the majority class and pay more attention to the region containing the minority class, conformal mapping is put on the original data region. Meanwhile, to tighten classification boundaries learned from the hyperspheres, a gain operation is implemented on the kernels. Experimental results show that the accuracy of classification boundaries learned by the proposed method reaches 0.880 on the synthetic datasets. Results also show and our classification accuracy is 0.731 on the highly imbalanced dataset with imbalanced ratio 87.8:1, which defeated against the competitors with significant advantages. Moreover, time consumption of the proposed method did not exponentially increase so that it is suitable for the classification to a large-scale scenario. We find that non-linear kernels are better at focusing on global regions, while conformal transformation can assist them better perception sub-regions. Conformal transformation is helpful the observation of the regions containing those hard-to-observe minority classes.
Similar content being viewed by others
Data availability
Data will be made available on request. The data is cited at http://archive.ics.uci.edu/ml/datasets/LED+Display+Domain, http://archive.ics.uci.edu/ml/datasets/Ecoli, http://archive.ics.uci.edu/ml/datasets/Yeast, http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/, http://archive.ics.uci.edu/ml/datasets/Poker+Hand
References
Mallikarjuna, C., Sivanesan, S.: Question classification using limited labeled data. Inf. Process. Manage. 59(6), 1–15 (2022)
Tang, Bo., He, H.: GIR-based ensemble sampling approaches for imbalanced learning. Pattern Recogn. 71, 306–319 (2017)
Furundzic, D., Stankovic, S., Jovicic, S.T., Punišić, S., Subotić, M.: Distance based resampling of imbalanced classes: With an application example of speech quality assessment. Eng. Appl. Artif. Intell. 64, 440–461 (2017)
Feng, L., Wang, H., Jinl, B.: Learning a distance Metric by balancing KL-divergence for imbalanced datasets. IEEE transaction on Systems, Man, and Cybernetics: Systems 49(12), 2384–2395 (2019)
Ando, S., Huang, C.Y.: Deep over-sampling framework for classifying imbalanced data. Proc. Joint Eur. Conf. Mach. Learn. Knowl. Discovery Databases, 770–785 (2017).
Dong, Q., Gong, S., Zhu, X.: Class rectification hard mining for imbalanced deep learning. Proc. IEEE Int. Conf. Comput. Vis., 1869–1878 (2017).
Douglas, P., Fuller, C.M.: Expressing uncertainty in information systems analytics research: a demonstration of bayesian analysis applied to binary classification problems. Inf. Process. Manage. 60(1), 1–17 (2022)
Lin, N., Sihui, Fu., Lin, X., Wang, L.: Multi-label emotion classification based on adversarial multi-task learning. Inf. Process. Manage. 59(6), 1–20 (2022)
Fernández, A., García, S., del José, M.J., Herrera, F.: A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets. Fuzzy Sets Syst. 159(18), 2378–2398 (2008)
Wang, N., Liang, R., Zhao, X.: Cost-sensitive hypergraph learning with F-measure optimization. IEEE Transactions on Cybernetics 3, 1–12 (2021)
Xiaoyuan, J., Xinyu, Z., Zhu Xiaoke, Wu., Fei, Y.X., Yang, G., Shiguang, S., JingYu, Y.: Multiset feature learning for highly imbalanced data classification. IEEE Trans. Pattern Anal. Mach. Intell. 43(1), 139–155 (2021)
Triguero, I., Galar, M., Vluymans, S., Cornelis, C., Bustince, H.: Evolutionary undersampling for imbalanced big data classification. Proc. IEEE Congr. Evol. Comput., 715–722 (2015).
Arkok, B., Zeki, A.M.: Classification of Quranic Topics Using SMOTE Technique. 2021 International Conference of Modern Trends in Information and Communication Technology Industry (MTICTI), IEEE, 1–4 (2021).
Chawla, N.V., Bowyer, K.W., Hall, L.J., Philip Kegelmeyer, W.: SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 16, 321–357 (2002)
Yitian, Xu., Zhiji, Y., Xianli, P.: A novel twin support-vector machine with pinball loss. IEEE transactions on neural networks and learning system 28(2), 359–370 (2017)
Datta, S., Das, S.: Near-Bayesian support vector machines for imbalanced data classification with equal or unequal misclassification costs. Neural Netw. 70, 39–52 (2015)
Yitian, X.: Maximum margin of twin spheres support vector machine for imbalanced data classification. IEEE Transactions on Cybernetics 47(6), 1540–1550 (2017)
Benshan, M., Yitian, X.: Multi-task least squares twin support vector machine for classification. Neurocomputing 338, 26–33 (2019)
Şalk, Y., Uzun B., Çevikalp H., Sarıbaş, H.: Anomaly Detection with Deep Compact Hypersphere. 30th Signal Processing and Communications Applications Conference (SIU), IEEE, 1–4 (2022).
Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: A Unified embedding for face recognition and clustering. Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 815–823 (2015).
Yueqi, D., Lei, C., Jiwen, L., Jie, Z.: Deep embedding learning with discriminative sampling policy. Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 4964–4973 (2019).
Hu Junlin, Lu., Jiwen, T.Y.: Sharable and individual multi-view metric learning. IEEE Trans. Pattern Anal. Mach. Intell.Pattern Anal. Mach. Intell. 40(9), 2281–2288 (2018)
Gao, H., Huang, W., Duan, Y.: Research on cost-driven services composition in an uncertain environment. J. Internet Technol. 20(3), 755–769 (2019)
Oğuz, Ç., Yağanoğlu, M.: Detection of COVID-19 using deep learning techniques and classification methods. Inf. Process. Manage. 59(5), 1–12 (2022)
Yintao, Y., Rui, M., Yili, W., Xin, W.: Contrastive Graph Convolutional Networks with adaptive augmentation for text classification. Inf. Process. Manage. 59(4), 1–16 (2022)
Tabassum, N., Menon, S., Jastrzębska, A.: Time-series classification with SAFE: simple and fast segmented word embedding-based neural time series classifier. Inf. Process. Manage. 59(5), 1–17 (2022)
Chen Haihua, Wu., Lei, C.J., Wei, Lu., Junhua, D.: A comparative study of automated legal text classification using random forests and deep learning. Inf. Process. Manage. 59(2), 1–18 (2022)
Muñoz, S., Iglesias, C.A.: A text classification approach to detect psychological stress combining a lexicon-based feature framework with distributional representations. Inf. Process. Manage. 59(5), 1–13 (2022)
Bzdok, D., Krzywinski, M., Altman, N.: Machine learning: supervised methods. Nat. Methods 15(1), 5–6 (2018)
Xianfeng, Gu., Shingtung, Y.: Computational Conformal Geometry. Springer (2020)
Chunxiuzi, L., Fengyang, S., Qingrui, N.: A Novel Graphic Bending Transformation on Benchmark. 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC), 11–14 (2020).
Guoxu, F., Jun, H., Hongbo, S.: A New Ray Tracing Method Based on Piecewise Conformal Transformations. IEEE Transactions on Microwave Theory and Techniques, 1–1 (2022).
Burges, C.J.C.: Geometry and invariance in kernel based methods [M]. Springer (1999)
Xinjun, P., Dong, X.: A twin-hypersphere support vector machine classifier and the fast learning algorithm. Information Science 221, 12–27 (2013)
Demsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Khan, S., Hayat, M., Zamir, W., Shen, J., Shao, L.: Striking the right balance with uncertainty. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 103–112 (2019).
Chengjian, F., Yujie, Z., Huang, W.: Exploring classification equilibrium in long-tailed object detection. In International Conference on Computer Vision, 3417–3426 (2021).
Zongyong, D., Hao, L., Yaoxing, W., Chenyang, W., Zekuan, Y.: PML: Progressive margin loss for long-tailed age classification. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10503–10512 (2021).
Wang, P., Han, K., Wei, X., Zhang, L., Wang, L.: Contrastive learning based hybrid networks for long-tailed image classification. Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 943–952 (2021).
Cui, Y., Jia, M., Lin, T., Song, Y., Belongie, S.: Class-balanced loss based on effective number of samples. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9268–9277 (2019).
Jamal, M., Brown, M.A., Yang, M.-H., Wang, L., Gong, B.: Rethinking class-balanced methods for long-tailed visual recognition from a domain adaptation perspective. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7610–7619 (2020).
Jialian, Wu., Liangchen, S., Qian, Z.: ForestDet: large-vocabulary long-tailed object detection and instance segmentation. IEEE Trans. Multimedia 24, 3693–3705 (2022)
Xingquan, Z., Xindong, W.: Class noise vs. attribute noise: a quantitative study of their impacts. Artif. Intell. Rev. 22, 177–210 (2004)
Funding
The research funding is supported by the Science and Technology Research Program of Chongqing Municipal Education Commission of China under Grant KJQN202300801. And the Natural Science Foundation of Chongqing under Grant cstc2021jcyj-msxmX0518. And the National Social Science Fund Project of China under Grant 19BTJ027.
Author information
Authors and Affiliations
Contributions
Jian Zheng proposed and wrote the manuscript. Lin Li and Shiyan Wang performed source code. Jian Zheng and Huyong Yan analyzed the experimental results.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zheng, J., Li, L., Wang, S. et al. Binary classification for imbalanced datasets using twin hyperspheres based on conformal method. Cluster Comput (2024). https://doi.org/10.1007/s10586-024-04528-x
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10586-024-04528-x