Skip to main content

Penalty term based suitable fuzzy intuitionistic possibilistic clustering: analyzing high dimensional gene expression cancer database

Abstract

The aim of this paper is to identify the co-expressed potential genes that may serve for the development of the portions of normal or tumor. This paper differentiates the co-expressed genes into normal samples and tumor samples from gene expression dataset GSE25066. Since the dataset has vague boundaries and having common characteristics between the clusters, identifying the subgroups contain similar gene expression is really a tricky task one. Therefore, this paper introduces an effective fuzzy iterative clustering algorithm by incorporating kernel function, possibilistic c-means, fuzzy memberships, neighborhood information, median of neighboring objects and penalty term. The performances of the proposed clustering techniques have been shown through the succession experimental works on GSE25066. The effects of clustering results have been proved through comparing the resulted classes with ground truth.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23

References

  • Aalaei S et al (2016) Feature selection using genetic algorithm for breast cancer diagnosis: Experiment on three different datasets. Iran J Basic Med Sci 19(5):476–482

    Google Scholar 

  • Agrawal U et al (2019) Combining clustering and classification ensembles: a novel pipeline to identify breast cancer profiles. Artif Intell Med 97:27–37

    Article  Google Scholar 

  • Aličković E, Subasi A (2017) Breast cancer diagnosis using GA feature selection and Rotation Forest. Neural Comput Appl 28(4):753–763

    Article  Google Scholar 

  • Andrew M et al. (2000) Efficient clustering of high-dimensional data sets with application to reference matching. In: KDD’00: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining, pp 169–178

  • Bai L et al (2018) An ensemble clusterer of multiple fuzzy k-means clusterings to recognize arbitrarily shaped clusters. IEEE Trans Fuzzy Syst 26(6):3524–3533

    Google Scholar 

  • Bohte SM et al (2002) Unsupervised clustering with spiking neurons by sparse temporal coding and multilayer RBF networks. IEEE Trans Neural Netw 13(2):426–435

    Article  Google Scholar 

  • Chaurasia V et al (2018) Prediction of benign and malignant breast cancer using data mining techniques. J Algorithms Comput Technol 12(2):119–126

    Article  Google Scholar 

  • Daniel G, Witold P (2010) Kernel-based fuzzy clustering and fuzzy clustering: a comparative experimental study. Fuzzy Sets Syst 161(4):522–543

    MathSciNet  Article  Google Scholar 

  • Doostparast TA, Fazel ZMH (2015) Alpha-plane based automatic general type-2 fuzzy clustering based on simulated annealing meta-heuristic algorithm for analyzing gene expression data. Comput Biol Med 64:347–359

    Article  Google Scholar 

  • Elhamifar E, Vidal R (2013) Sparse subspace clustering: algorithm, theory, and applications. IEEE Trans Pattern Anal Mach Intell 35(11):2765–2781

    Article  Google Scholar 

  • Frank K (2013) What can fuzzy cluster analysis contribute to clustering of high-dimensional data?, In: Francesco M, Gabriella P, Ronald Y (eds) WILF: International workshop on Fuzzy logic and applications, Lecture Notes in Computer Science, vol 8256, Proceedings. Springer

  • Guihong C et al. (2004) Fuzzy K-means clustering on a high dimensional semantic space. In: Yu JX, Lin X, Lu H, Zhang Y (eds) Advanced web technologies and applications. asia-pacific web conference (APWeb), Lecture Notes in Computer Science, vol 3007, Proceedings. Springer, Berlin, Heidelberg

  • Haozhe X (2016) Comparison among dimensionality reduction techniques based on Random Projection for cancer classification. Comput Biol Chem 65:165–172

    Article  Google Scholar 

  • Hoda Z, Mohammad HN-S (2016) Swarm intelligence approach for breast cancer diagnosis. Int J Comput Appl 151(1):40–44

    Google Scholar 

  • Hossam MM et al (2014) Adaptive k-means clustering algorithm for MR breast image segmentation. Neural Comput Appl 24(7–8):1917–1928

    Google Scholar 

  • Hu Y, Pizzi NJ (2004) Biomedical data classification using hierarchical clustering. In: Canadian conference on electrical and computer engineering 2004 (IEEE Cat. No. 04CH37513), Niagara Falls, Ontario, Canada, vol 4, pp 1861–1864

  • Kannan SR et al (2017) Effective fuzzy possibilistic C-means: an analyzing cancer medical database. Soft Comput 21:2835–2845

    Article  Google Scholar 

  • Katherine JM et al. (2001) High-sensitivity array analysis of gene expression for the early detection of disseminated breast tumor cells in peripheral blood. In: Proceedings of the National Academy of Sciences (PNAS), vol 98(5), pp 2646–2651

  • Kothari C et al (2018) Identification of a gene signature for different stages of breast cancer development that could be used for early diagnosis and specific therapy. Oncotarget 9(100):37407–37420

    Article  Google Scholar 

  • Kuo RJ et al (2018) A hybrid metaheuristic and Kernel intuitionistic fuzzy c-means algorithm for cluster analysis. Appl Soft Comput 67:299–308

    Article  Google Scholar 

  • Lance P et al. (2004) Subspace clustering for high dimensional data: a review. In: ACM SIGKDD Explorations Newsletter, vol 6(1)

  • Lestari AW, Rustam Z (2017) Normed kernel function-based fuzzy possibilistic C-means (NKFPCM) algorithm for high-dimensional breast cancer database classification with feature selection is based on Laplacian Score. In: The American Institute of physics (AIP) conference proceedings, vol 1862(1), pp 030143

  • Nidheesh N et al (2017) An enhanced deterministic K-Means clustering algorithm for cancer subtype prediction from gene expression data. Comput Biol Med 91:213–221

    Article  Google Scholar 

  • Patel BC, Sinha GR (2010) An adaptive K-means clustering algorithm for breast image segmentation. Int J Comput Appl 10(4):35–38

    Google Scholar 

  • Ramathilagam S et al (2013) Extended fuzzy c-means: an analyzing data clustering problems. Cluster Comput 16(3):389–406

    Article  Google Scholar 

  • Renato C et al (2012) Fuzzy and possibilistic clustering for fuzzy data. Comput Stat Data Anal 56(4):915–927

    MathSciNet  Article  Google Scholar 

  • Robert C et al (2008) The properties of high-dimensional data spaces: implications for exploring gene and protein expression data. Nat Rev Cancer 8(1):37–49

    Article  Google Scholar 

  • Roland W et al (2011) Fuzzy C-means in high dimensional spaces. Int J Fuzzy Syst Appl 1(1):1–16

    Google Scholar 

  • Sheshadri HS, Kandaswamy A (2006) Computer aided decision system for early detection of breast cancer. Indian J Med Res 124(2):149–154

    Google Scholar 

  • Sun L et al (2015) An effective fuzzy kernel clustering analysis approach for gene expression data. Bio-Med Mater Eng 26(s1):S1863–S1869

    Article  Google Scholar 

  • Ujjwal M, Anirban M (2010) Simulated annealing based automatic fuzzy clustering combined with ANN classification for analyzing microarray data. Comput Oper Res 37(8):1369–1380

    Article  Google Scholar 

  • Velusamy P et al (2018) New scheme for breast cancer detection and staging using ant colony algorithm. Int J Biomed Eng Technol 27(1/2):86

    Article  Google Scholar 

  • Yating H et al (2012) Unsupervised possibilistic clustering based on kernel methods. Phys Procedia 25:1084–1090

    Article  Google Scholar 

  • Yu Z et al (2016) Incremental semi-supervised clustering ensemble for high dimensional data clustering. IEEE Trans Knowl Data Eng 28(3):701–714

    Article  Google Scholar 

  • Zheng L, Li T (2011) Semi-supervised hierarchical clustering. In: 2011 IEEE 11th international conference on data mining, Vancouver, BC, pp 982–991

  • Zuherman R, Sri H (2019) classification of breast cancer using fast fuzzy clustering based on kernel. IOP Conf Ser Mater Sci Eng 546(5):052067

    Google Scholar 

Download references

Acknowledgments

This work was financially supported by DST India and MOST Israel.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. R. Kannan.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kannan, S.R., Kashyap, E., Last, M. et al. Penalty term based suitable fuzzy intuitionistic possibilistic clustering: analyzing high dimensional gene expression cancer database. Soft Comput 25, 9839–9857 (2021). https://doi.org/10.1007/s00500-020-05321-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-020-05321-9

Keywords

  • Fuzzy clustering
  • Big data
  • Neighboring objects
  • Cancer database
  • Penalty term