Abstract
Biomedical data exhibit high-dimensional complexity in its internal structure and are susceptible to noise interference, making classification tasks in biomedical data highly challenging. Twin support vector machine (TSVM) is a machine learning algorithm that can effectively solve pattern recognition problems. To mitigate the negative impact of noise, researchers have combined fuzzy set theory with TSVM and use fuzzy membership to describe the influence of different samples on constructing the optimal hyperplane, thus, extending TSVM to fuzzy twin support vector machines (FTSVM). In this paper, the dissimilarity measure based on data distribution is innovatively introduced into the fuzzy membership assignment process, and a novel fuzzy membership assignment strategy is designed to effectively reduce the negative impact of noise in biomedical data. Rather than rely on geometric distance, this strategy takes data distribution as the primary factor in measuring dissimilarity between samples and then constructs a heuristic function to assign fuzzy membership to different samples. Combining this strategy with TSVM, this paper proposed a fuzzy twin support vector machine based on dissimilarity measure (DFTSVM), which could effectively solve the classification problem with noise and shows excellent generalization performance in biomedical data. Moreover, DFTSVM employs a coordinate descent strategy with shrinking by active set to reduce computational complexity, which significantly improves the training speed of the model. Experiments are conducted on 14 biomedical datasets to compare the performance of DFTSVM with 10 heterogeneous machine learning classification algorithms and four homology algorithms. The results demonstrate that DFTSVM outperforms other algorithms in terms of classification performance on biomedical data. It exhibits excellent generalization performance in noisy environments, and its advantages in terms of generalization performance and noise robustness become more prominent as the noise rate increases.
Similar content being viewed by others
Data Availability
This paper uses the UCI Machine Learning Repository, which is publicly available on the Internet. As follows: https://archive.ics.uci.edu/.
References
Anagaw, A., Chang, Y.L.: A new complement Naïve Bayesian approach for biomedical data classification. J. Ambient. Intell. Humaniz. Comput. 10(10), 3889–3897 (2019)
Aryal, S., Ting, K.M., Haffari, G., Washio, T.: MP-dissimilarity: a data dependent dissimilarity measure. In: 2014 IEEE International Conference on Data Mining, IEEE. pp. 707–712 (2014)
Aryal, S., Ting, K.M., Washio, T., Haffari, G.: Data-dependent dissimilarity measure: an effective alternative to geometric distance measures. Knowl. Inf. Syst. 53(2), 479–506 (2017)
Asuncion, A., Newman, D.: UCI machine learning repository. (2007)
Bai, J., Li, Y., Li, J., Yang, X., Jiang, Y., Xia, S.T.: Multinomial random forest. Pattern Recogn. 122, 108331 (2022)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Chen, T.Q., He, T.: Xgboost: extreme gradient boosting. R package version 04-2. 1(4), 1–4 (2015)
Das, H., Naik, B., Behera, H.S., Jaiswal, S., Mahato, P., Rout, M.: Biomedical data analysis using neuro-fuzzy model with post-feature reduction. J. King Saud Univ.-Comput. Inf. Sci. 34(6), 2540–2550 (2022)
Ding, S., Xu, X., Wang, Y.: Optimized density peaks clustering algorithm based on dissimilarity measure. J. Softw. 31(11), 3321–3333 (2020)
Ganaie, M.A., Tanveer, M.: Alzheimer’s disease neuroimaging initiative: fuzzy least squares projection twin support vector machines for class imbalance learning. Appl. Soft Comput. 113, 107933 (2021)
Ganaie, M.A., Kumari, A., Malik, A.K., Tanveer, M.: EEG signal classification using improved intuitionistic fuzzy twin support vector machines. Neural Comput. Appl. 36(1), 1–17 (2022)
Ganaie, M., Tanveer, M., Lin, C.T.: Large-scale fuzzy least squares twin SVMS for class imbalance learning. IEEE Trans. Fuzzy Syst. 30(11), 4815–4827 (2022)
Ganaie, M.A., Kumari, A., Girard, A., Kasa-Vubu, J., Tanveer, M.: Alzheimer’s disease neuroimaging initiative: diagnosis of Alzheimer’s disease via intuitionistic fuzzy least squares twin SVM. Appl. Soft Comput. 149, 110899 (2023)
Gao, B.B., Wang, J.J., Wang, Y., Yang, C.Y.: Coordinate descent fuzzy twin support vector machine for classification. In: 2015 IEEE 14th international conference on machine learning and applications (ICMLA), IEEE. pp. 7–12 (2015)
Gautam, C., Mishra, P.K., Tiwari, A., Richhariya, B., Pandey, H.M., Wang, S.H., Tanveer, M.: Alzheimer’s disease neuroimaging initiative: minimum variance-embedded deep kernel regularized least squares method for one-class classification and its applications to biomedical data. Neural Netw. 123, 191–216 (2020)
Gupta, D., Richhariya, B., Borah, P.: A fuzzy twin support vector machine based on information entropy for class imbalance learning. Neural Comput. Appl. 31(11), 7153–7164 (2019)
Gupta, D., Borah, P., Sharma, U.M., Prasad, M.: Data-driven mechanism based on fuzzy Lagrangian twin parametric-margin support vector machine for biomedical data analysis. Neural Comput. Appl. 34(14), 11335–11345 (2022)
Gupta, U., Gupta, D.: Bipolar fuzzy based least squares twin bounded support vector machine. Fuzzy Sets Syst. 449, 120–161 (2022)
Hazarika, B.B., Gupta, D.: Density-weighted support vector machines for binary class imbalance learning. Neural Comput. Appl. 33(9), 4243–4261 (2021)
Hosmer, D.W., Lemeshow, S., Sturdivant, R.X.: Applied Logistic Regression, p. 398. Wiley, Hoboken (2013)
Ju, H., Qiang, W., Jing, L.: A novel interval-valued fuzzy multiple twin support vector machine. Iran. J. Fuzzy Syst. 18(2), 93–107 (2021)
Ke, G.L., Finley, T., Wang, T.F., Chen, W., Ma, W.D., Ye, Q.W., Liu, T.Y.: Lightgbm: a highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 30, 1–9 (2017)
Khemchandani, R., Chandra, S.: Twin support vector machines for pattern classification. IEEE Trans. Pattern Anal. Mach. Intell. 29(5), 905–910 (2007)
Krumhansl, C.L.: Concerning the applicability of geometric models to similarity data: the interrelationship between similarity and spatial density. Psychol. Rev. 85(5), 445–463 (1978)
Liang, Z.Z., Lei, Z.: Intuitionistic fuzzy twin support vector machines with the insensitive pinball loss. Appl. Soft Comput. 115, 108231 (2022)
Liu, M.Z., Zhou, J., Xi, Q., Liang, Y.C., Li, H.C., Liang, P.F., Guo, Y.T., Liu, M., Temuqile, T., Yang, L., Zou, Y.C.: A computational framework of routine test data for the cost-effective chronic disease prediction. Brief. Bioinf. 24(2), bbad054 (2023)
Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A.V., Gulin, A.: CatBoost: unbiased boosting with categorical features. Adv. Neural Inf. Process. Syst. 31, 1–11 (2018)
Qiu, J.X., Xie, J.L., Zhang, D.X., Zhang, R.P.: A robust twin support vector machine based on fuzzy systems. Int. J. Intell. Comput. Cybern. 17(1), 101–25 (2023)
Rasool, Z., Aryal, S., Bouadjenek, M.R., Dazeley, R.: Overcoming weaknesses of density peak clustering using a data-dependent similarity measure. Pattern Recogn. 137, 109287 (2023)
Ren, J., Wang, Y., Cheung, Y.M., Gao, X.Z., Guo, X.: Grouping-based oversampling in kernel space for imbalanced data classification. Pattern Recogn. 133, 108992 (2023)
Rezvani, S., Wang, X., Pourpanah, F.: Intuitionistic fuzzy twin support vector machines. IEEE Trans. Fuzzy Syst. 27(11), 2140–2151 (2019)
Richhariya, B., Tanveer, M.: EEG signal classification using universum support vector machine. Expert Syst. Appl. 106, 169–182 (2018)
Richhariya, B., Tanveer, M.: Alzheimer’s disease neuroimaging initiative: an efficient angle-based universum least squares twin support vector machine for classification. ACM Trans. Internet Technol. (TOIT) 21(3), 1–24 (2021)
Richhariya, B., Tanveer, M.: Alzheimer’s disease neuroimaging initiative: a fuzzy universum least squares twin support vector machine (FULSTSVM). Neural Comput. Appl. 34(14), 11411–11422 (2022)
Tanveer, M., Ganaie, M.A., Bhattacharjee, A., Lin, C.T.: Intuitionistic fuzzy weighted least squares twin SVMs. IEEE Trans. Cybern. 53(7), 4400–4409 (2023)
Ting, K.M., Zhu, Y., Carman, M., Zhu, Y., Zhou, Z.H.: Overcoming key weaknesses of distance-based neighbourhood methods using a data dependent dissimilarity measure. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. pp. 1205–1214 (2016)
Wang, H., Gupta, G.: Fold-r++: a scalable toolset for automated inductive learning of default theories from mixed data. In: International Symposium on Functional and Logic Programming, Springer. pp. 224–242 (2022)
Wang, H., Shakerin, F., Gupta, G.: Fold-rm: a scalable, efficient, and explainable inductive learning algorithm for multi-category classification of mixed data. Theory Pract. Logic Program. 22(5), 658–677 (2022)
Xu, Y., Yang, Z., Pan, X.: A novel twin support-vector machine with pinball loss. IEEE Trans. Neural Netw. Learn. Syst. 28(2), 359–370 (2016)
Zhang, L., Yang, H., Jiang, Z.: Imbalanced biomedical data classification using self-adaptive multilayer elm combined with dynamic GAN. Biomed. Eng. Online 17(1), 1–21 (2018)
Zou, Y., Ding, Y., Peng, L., Zou, Q.: FTWSVM-SR: DNA-binding proteins identification via fuzzy twin support vector machines on self-representation. Interdiscip. Sci. 14(2), 372–384 (2021)
Acknowledgements
This paper would like to thank the editors and the anonymous referees for their professional comments, which improved the quality of the manuscript. This work was supported in part by the National Natural Science Foundation of China (Grant Nos. 12271211, 12071179), the National Natural Science Foundation of Fujian Province (Grant Nos. 2021J01861, 2020J01710), the Youth Innovation Fund of Xiamen City (Grant No. 3502Z20206020), and the Open Fund of Digital Fujian Big Data Modeling and Intelligent Computing Institute, Pre-Research Fund of Jimei University.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
All authors declare that they have no Conflict of interest.
Informed Consent
Informed consent was obtained from all individual participants included in the study.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Qiu, J., Xie, J., Zhang, D. et al. A Fuzzy Twin Support Vector Machine Based on Dissimilarity Measure and Its Biomedical Applications. Int. J. Fuzzy Syst. (2024). https://doi.org/10.1007/s40815-024-01725-z
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s40815-024-01725-z