A Genetic Algorithm-Based Clustering Approach for Selecting Non-redundant MicroRNA Markers from Microarray Expression Data

  • Monalisa MandalEmail author
  • Anirban Mukhopadhyay
  • Ujjwal Maulik
Conference paper
Part of the Springer Proceedings in Mathematics & Statistics book series (PROMS, volume 225)


During the last few years, different studies have been done to reveal the involvement of microRNAs (miRNAs) in pathways of different types of cancers. It is evident from the research in this field that miRNA expression profiles help classify cancerous tissue from normal tissue or different subtypes of cancer. In this article, miRNA expression data of different cancer types are analyzed using a novel multiobjective genetic algorithm-based feature selection method for finding reduced non-redundant set of miRNA markers. Three objectives, viz. classification accuracy, a cluster validity index call Davies–Bouldin (DB) index, and the number of miRNAs encoded in a chromosome of genetic algorithm is optimized simultaneously. The classification accuracy is maximized to obtain the most relevant set of miRNAs. DB index is optimized for clustering the miRNAs and choosing representative miRNAs from each cluster in order to obtain a non-redundant set of miRNA markers. Finally, the number of miRNAs is minimized to yield a reduced set of selected miRNAs. The performance of the proposed genetic algorithm-based method is compared with that of the other existing feature selection techniques. It has been found that the performance of the proposed technique is better than that of the other methods with respect to most of the performance metrics. Lastly, the obtained miRNA markers with their associated disease and number of target mRNAs are reported.


MicroRNA Multiobjective optimization Genetic algorithm Clustering Davies–Bouldin index 


  1. 1.
    Bandyopadhyay, S., Mallik, S., Mukhopadhyay, A.: A survey and comparative study of statistical tests for identifying differential expression from microarray data. IEEE/ACM Trans. Comput. Biol. Bioinform. 11(1), 95–115 (2014)CrossRefGoogle Scholar
  2. 2.
    Cover, T., Thomas, J.: Entropy, Relative Entropy and Mutual Information. Elements of Information Theory, Wiley (2006)Google Scholar
  3. 3.
    Covoes, T.F., Hruschka, E.R., de Castro, L.N., Santos, A.M.: A cluster-based feature selection approach. In: International Conference on Hybrid Artificial Intelligence Systems, pp. 169–176 (2009)Google Scholar
  4. 4.
    Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intel. 1(2), 224–227 (1979)CrossRefGoogle Scholar
  5. 5.
    Deb, K., Pratap, A., Agrawal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. In: IEEE Transactions on Evolutionary Computation, pp. 182–197 (2002)Google Scholar
  6. 6.
    Ding, C., Peng, H.: Minimum redundancy feature selection for microarray gene expression data. J. Bioinform. Comput. Biol. 3(2), 185–205 (2005)CrossRefGoogle Scholar
  7. 7.
    Gasch, A.P., Eisen, M.B.: Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering. In: Genome Biol. 3(11), 0059.1–0059.22 (2002)Google Scholar
  8. 8.
    Goldberg, D.E.: Genetic Algorithms in Search. Optimization and Machine Learning. Addison-Wesley, New York (1989)Google Scholar
  9. 9.
    Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gassenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomeld, D.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)CrossRefGoogle Scholar
  10. 10.
    Kamandar, M., Ghassemian, H.: Maximum relevance, minimum redundancy band selection for hyperspectral images. In: 19th Iranian Conference on Electrical Engineering (ICEE) (2011)Google Scholar
  11. 11.
    Lu, J., Getz, G., Miska, E.A., Alvarez-Saavedra, E., Lamb, J., Peck, D., Sweet-Cordero, A., Ebert, B.L., Mak, R.H., Ferrando, A.A., Downing, J.R., Jacks, T., Horvitz, H.R., Golub, T.R.: MicroRNA expression profiles classify human cancers. Nature 435(7043), 834–838 (2005)CrossRefGoogle Scholar
  12. 12.
    Mandal, M., Mukhopadhyay, A.: A graph-theoretic approach for identifying non-redundant and relevant gene markers from microarray data using multiobjective binary PSO. Plos One 9(3), e90949 (2014)CrossRefGoogle Scholar
  13. 13.
    Mankiewicz, R.: The Story of Mathematics. Princeton University Press (2000)Google Scholar
  14. 14.
    Maulik, U., Bandyopadhyay, S., Mukhopadhyay, A.: Multiobjective Genetic Algorithms for Clustering–Applications in Data Mining and Bioinformatics. Springer, ISBN 978-3-642-16615-0 (2011)Google Scholar
  15. 15.
    Mukhopadhyay, A., Bandyopadhyay, S., Maulik, U.: Multi-class clustering of cancer subtypes through SVM based ensemble of paretooptimal solutions for gene marker identification. PLoS One 5(11), e13803 (2010)CrossRefGoogle Scholar
  16. 16.
    A. Mukhopadhyay and M. Mandal. Identifying non-redundant gene markers from microarray data: a multiobjective variable length PSO-based approach. IEEE/ACM Trans. Comput. Biol. Bioinform. pp(99) (2014)Google Scholar
  17. 17.
    Mukhopadhyay, A., Maulik, U.: An SVM-wrapped multiobjective evolutionary feature selection approach for identifying cancer-microRNA markers. IEEE Trans. NanoBioSci. 12(4), 275–281 (2013)CrossRefGoogle Scholar
  18. 18.
    Mukhopadhyay, A., Maulik, U., Bandyopadhyay, S.: An interactive approach to multiobjective clustering of gene expression patterns. IEEE Trans. Biomed. Eng. 60(1), 35–41 (2013)CrossRefGoogle Scholar
  19. 19.
    Mukhopadhyay, A., Maulik, U., Bandyopadhyay, S.: A survey of multiobjective evolutionary clustering. ACM Comput. Surv. (CSUR) 47(4), 61:1–61:46 (2015)Google Scholar
  20. 20.
    Ruiza, R., Riquelmea, J.C., Aguilar-Ruizb, J.S.: Incremental wrapper-based gene selection from microarray data for cancer classification. Pattern Recognit. 39(12), 2383–2392 (2010)CrossRefGoogle Scholar
  21. 21.
    Sun, J.-G., Liao, R.-X., Qiu, J., Jin, J.-Y., Wang, X.-X., Duan, Y.-Z., Chen, F.-L., Hao, P., Xie, Q.-C., Wang, Z.-X., Li, D.-Z., Chen, Z.-T., Zhang, S.-X.: Microarray-based analysis of microRNA expression in breast cancer stem cells. J. Exp. Clin. Cancer Res. 29(174) (2010)Google Scholar
  22. 22.
    Thomson, J.M., Parker, J., Perou, C.M., Hammond, S.M.: A custom microarray platform for analysis of microRNA gene expression. Nat. Methods 1(1), 47–53 (2004)CrossRefGoogle Scholar
  23. 23.
    Troyanskaya, O., Garber, M., Brown, P., Botstein, D., Altman, R.: Nonparametric methods for identifying differentially expressed genes in microarray data. Bioinformatics 18, 1454–1461 (2002)CrossRefGoogle Scholar
  24. 24.
    Vapnik, V.: Statistical Learning Theory. Wiley, New York, USA (1998)zbMATHGoogle Scholar
  25. 25.
    Wu, D., Hu, Y., Tong, S., Williams, B.R., Smyth, G.K., Gantier, M.: The use of mirna microarrays for the analysis of cancer samples with global mirna decrease. RNA 19(7), 876–888 (2013)CrossRefGoogle Scholar
  26. 26.
    Zhang, Z., Hancock, E.R.: A graph-based approach to feature selection. In: International Workshop on Graph-Based Representations, Pattern Recognition, pp. 205–214 (2011)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  • Monalisa Mandal
    • 1
    Email author
  • Anirban Mukhopadhyay
    • 2
  • Ujjwal Maulik
    • 3
  1. 1.Department of Computer and Information ScienceUniversity of Science and Technology (NTNU)TrondheimNorway
  2. 2.Department of Computer Science and EngineeringUniversity of KalyaniKalyaniIndia
  3. 3.Department of Computer Science and EngineeringJadavpur UniversityKolkataIndia

Personalised recommendations