Skip to main content

A robust fuzzy approach for gene expression data clustering

Abstract

In the big data era, clustering is one of the most popular data mining method. Most clustering algorithms have complications like automatic cluster number determination, poor clustering precision, inconsistent clustering of various datasets and parameter-dependent, etc. A new fuzzy autonomous solution for clustering named Meskat-Mahmudul (MM) clustering algorithm was proposed to overcome the complexity of parameter-free automatic cluster number determination and clustering accuracy. The Meskat-Mahmudul clustering algorithm finds out the exact number of clusters based on the average silhouette method in multivariate mixed attribute dataset, including real-time gene expression dataset and missing values, noise, and outliers. Meskat-Mahmudul Extended K-Means (MMK) clustering algorithm enhances the K-Means algorithm, which serves the purpose of automatic cluster discovery and runtime cluster placement. Several validation methods are used to evaluate clusters and certify optimum cluster partitioning and perfection. Some datasets are used to assess the performance of the proposed algorithms to other algorithms in terms of time complexity and clustering efficiency. Finally, Meskat-Mahmudul clustering and Meskat-Mahmudul Extended K-Means clustering algorithms were found superior over conventional algorithms.

This is a preview of subscription content, access via your institution.

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

References

  1. Abualigah L, Diabat A, Mirjalili S, AbdElaziz M, Gandomi AH (2021a) The arithmetic optimization algorithm. Comput Methods Appl Mech Eng. https://doi.org/10.1016/j.cma.2020.113609

    MathSciNet  Article  MATH  Google Scholar 

  2. Abualigah L, Yousri D, AbdElaziz M, Ewees AA, Al-qaness MAA, Gandomi AH (2021b) Aquila optimizer: a novel meta-heuristic optimization algorithm. Comput Ind Eng 157:107250. https://doi.org/10.1016/J.CIE.2021.107250

    Article  Google Scholar 

  3. Abualigah L, Ali D, AbdElaziz M (2021) Intelligent workflow scheduling for Big Data applications in IoT cloud computing environments. Cluster Comput. https://doi.org/10.1007/s10586-021-03291-7

    Article  Google Scholar 

  4. Abualigah L, Ali D (2021) Advances in sine cosine algorithm: a comprehensive survey. Artif Intell Rev 54:2567–2608. https://doi.org/10.1007/s10462-020-09909-3

    Article  Google Scholar 

  5. Abualigah L, Alkhrabsheh M, AbualigahAligah L, Abualigah L, Alkhrabsheh M (2021) Amended hybrid multi-verse optimizer with genetic algorithm for solving task scheduling problem in cloud computing task scheduling Multi-verse optimizer genetic algorithm Hybrid method. J Supercomput. https://doi.org/10.1007/s11227-021-03915-0

    Article  Google Scholar 

  6. Bezdek JC, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 10:191–203. https://doi.org/10.1016/0098-3004(84)90020-7

    Article  Google Scholar 

  7. Cai J, Wei H, Yang H, Zhao X (2020) A novel clustering algorithm based on DPC and PSO. IEEE Access 8:88200–88214. https://doi.org/10.1109/ACCESS.2020.2992903

    Article  Google Scholar 

  8. Chao Xu, Zhang P, Li B, Dinghai Wu HF (2013) Vague C-means clustering algorithm - sciencedirect. Pattern Recognit Lett 34:505–510

    Article  Google Scholar 

  9. Chen JY, He HH (2015) Research on density-based clustering algorithm for mixed data with determine cluster centers automatically. Zidonghua Xuebao/acta Autom Sin. https://doi.org/10.16383/j.aas.2015.c150062

    Article  MATH  Google Scholar 

  10. Dhanachandra N, Manglem K, Chanu YJ (2015) Image segmentation using K-means clustering algorithm and subtractive clustering algorithm. Procedia Comput Sci 54:764–771. https://doi.org/10.1016/j.procs.2015.06.090

    Article  Google Scholar 

  11. Dua D, Graff C (2017) {UCI} Machine learning repository

  12. Eberhart R, Kennedy J (1995) New optimizer using particle swarm theory. In: proceedings of the international symposium on micro machine and human science

  13. Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: proceedings of the 2nd international conference on knowledge discovery and data mining

  14. Gentle JE, Kaufman L, Rousseuw PJ (1991) Finding groups in data: an introduction to cluster analysis. Biometrics 47(2):788

    Article  Google Scholar 

  15. Han J, Kamber M, Pei J (2012) Data mining: concepts and techniques

  16. Hou J, Gao H, Li X (2016) DSets-DBSCAN: a parameter-free clustering algorithm. IEEE Trans Image Process. https://doi.org/10.1109/TIP.2016.2559803

    MathSciNet  Article  MATH  Google Scholar 

  17. Jahan M, Hasan M (2019) Performance analysis and benchmarking of clustering algorithms with gene datasets. In: 1st international conference on advances in science, engineering and robotics technology 2019, ICASERT 2019

  18. Jahan M, Hasan M (2020) A novel fuzzy clustering approach for gene classification. Int J Adv Comput Sci Appl. https://doi.org/10.14569/IJACSA.2020.0110809

    Article  Google Scholar 

  19. Nayak J, Nayak B, Behera HS (2015) Fuzzy C-means (FCM) clustering algorithm: a decade review from 2000 to 2014 janmenjoy. Comput Intell Data Min 2:133–149. https://doi.org/10.1007/978-81-322-2208-8

    Article  Google Scholar 

  20. Jinyin C, Xiang L, Haibing Z, Xintong B (2017) A novel cluster center fast determination clustering algorithm. Appl Soft Comput J 57:539–555. https://doi.org/10.1016/j.asoc.2017.04.031

    Article  Google Scholar 

  21. Lei T, Jia X, Zhang Y, He L, Meng H, Nandi AK (2018) Significantly fast and robust fuzzy C-means clustering algorithm based on morphological reconstruction and membership filtering. IEEE Trans Fuzzy Syst 26:3027–3041. https://doi.org/10.1109/TFUZZ.2018.2796074

    Article  Google Scholar 

  22. Ng RT, Han J (1994) Efficient and effective clustering methods for spatial data mining. In: Proceedings 20th international conference very large data bases pp. 144–155

  23. Reddy GT, Khare N (2017a) Hybrid firefly-bat optimized fuzzy artificial neural network based classifier for diabetes diagnosis. Int J Intell Eng Syst 10:18–27. https://doi.org/10.22266/ijies2017.0831.03

    Article  Google Scholar 

  24. Reddy GT, Khare N (2017b) An efficient system for heart disease prediction using hybrid OFBAT with rule-based fuzzy logic model. J Circuits Syst Comput 26:1–21. https://doi.org/10.1142/S021812661750061X

    Article  Google Scholar 

  25. Reddy GT, Reddy MPK, Lakshmanna K, Rajput DS, Kaluri R, Srivastava G (2020) Hybrid genetic algorithm and a fuzzy logic classifier for heart disease diagnosis. Evol Intell 13:185–196. https://doi.org/10.1007/s12065-019-00327-1

    Article  Google Scholar 

  26. Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science. https://doi.org/10.1126/science.1242072

    Article  Google Scholar 

  27. Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65. https://doi.org/10.1016/0377-0427(87)90125-7

    Article  MATH  Google Scholar 

  28. Sinaga KP, Yang MS (2020) Unsupervised K-means clustering algorithm. IEEE Access 8:80716–80727. https://doi.org/10.1109/ACCESS.2020.2988796

    Article  Google Scholar 

  29. Tilson L V., Excell PS, Green RJ (1988) A generalisation of the Fuzzy c-Means clustering algorithm. In: remote sensing Proc IGARSS ’88 Symposium Edinburgh, vol 3, pp. 1783–1784. doi: https://doi.org/10.1109/igarss.1988.569600

  30. Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Trans Evol Comput. https://doi.org/10.1109/4235585893

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Meskat Jahan.

Ethics declarations

Conflict of interest

Author declares that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Jahan, M., Hasan, M. A robust fuzzy approach for gene expression data clustering. Soft Comput 25, 14583–14596 (2021). https://doi.org/10.1007/s00500-021-06397-7

Download citation

Keywords

  • FCM
  • K-Means
  • Fuzzy clustering
  • Clustering algorithm
  • Data mining