A Cluster-Based Improved Expectation Maximization Framework for Identification of Somatic Gene Clusters

  • Anuradha Chokka
  • K. Sandhya RaniEmail author
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1054)


The early identification of cancer disease is the key objective of this paper. Machine learning algorithm’s contribution is significant for early recognition of somatic mutations in cancer patients. So, the study of classification and clustering play a vital role in predicting the somatic mutations patterns. As the size of gene variants and somatic mutation patterns in the tumor increases, it is essential and effective to predict the disease patterns using the machine learning models. In this proposed work, a novel framework is designed and implemented on the somatic cancer datasets. In this work, somatic mutation patterns are clustered using the related features of gene sequences by using the proposed improved expectation maximization (IEM) model. On each cluster, AdaBoost classifier is applied to classify somatic mutations patterns. Experimental results proved that the proposed clustering algorithm IEM is better than the traditional approaches in terms of cluster quality rate. The overall classification accuracy for all the clusters is also satisfactory.


Somatic mutations patterns Improved expectation maximization Classification and clustering 


  1. 1.
    Joshi, J., Doshi, R., Patel, J.: Diagnosis of breast cancer using clustering data mining approach. Int. J. Comput. Appl. 101(10), 0975–8887(2014)CrossRefGoogle Scholar
  2. 2.
    Krishnamoorthy, I. Aroquiaraj, L.: A comparative study of clustering algorithm for lung cancer data. Int. J. Sci. Eng. Res. 7(9) (2016)Google Scholar
  3. 3.
    He, B., Torkey, H., Azam, S.Z.M., Zhang, L.: Analysis of cancer somatic mutations taking into consideration human genetic variations. In: Conference on Bioinformatics and Computational Biology, Mar (2014)Google Scholar
  4. 4.
    Zhao, M., Tang, Y., Kim, H., Hasegawa, K.: Machine learning with K-Means dimensional reduction for predicting survival outcomes in patients with breast cancer 17, 1–7 (2018)Google Scholar
  5. 5.
    Sharma, A., Gupta, R.K., Tiwari, A.: Improved density based spatial clustering of applications of noise clustering algorithm for knowledge discovery in spatial data. Math. Prob. Eng. 2016, 9 (2016). Article ID 1564516Google Scholar
  6. 6.
    Chakraborty, S., Nagwani, N.K.: Analysis and study of Incremental DBSCAN Clustering algorithm. Int. J. Enterp. Comput. Bus. Syst. 1(2) (2011)Google Scholar
  7. 7.
    Adebisi1, A.A., Omidiora O.E., Olabiyisi S.O.: An exploratory study of K-Means and expectation maximization algorithms. British J. Math. Comput. Sci. 2(2), 62–71 (2012)CrossRefGoogle Scholar
  8. 8.
    Rajaguru, H., Prabhakar, S.K.: Expectation maximization based logistic regression for breast cancer classification. International Conference on Electronics Communication and Aerospace 20–22 April 2017, Coimbatore, India (2017)Google Scholar
  9. 9.
    Thongkam, J., Xu, G., Zhang, Y.: AdaBoost algorithm with random forests for predicting breast cancer survivability. IEEE Int. Joint Conf. Neural Network, June 1–8, 2008, Honkong, Chaina (2008)Google Scholar
  10. 10.
    Senkamalavalli, R., Bhuvaneswari, T.: Improved classification of breast cancer data using hybrid techniques. Int. J. Adv. Eng. Res. Sci. (IJERS) 8(8) (2017)Google Scholar
  11. 11.
    Trinh, Q.M., Spears, M., McPherson, J.D.: ISOWN: accurate somatic mutation identification in the absence of normal tissue controls Irina Kalatskaya. Genome Med. 9(1), 59 (2017)Google Scholar
  12. 12.
    Kharya, S., Agrawal, S., Soni, S.: Naive bayes classifiers: a probabilistic detection model for breast cancer. Int. J. Comput. Appl. 92(10), 0975–8887 (2014)CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  1. 1.Department of Computer ScienceSri Padmavati Mahila VisvavidyalayamTirupatiIndia

Personalised recommendations