Skip to main content

A Comparative Study on Different Versions of Multi-Objective Genetic Algorithm for Simultaneous Gene Selection and Sample Categorization

  • Chapter
  • First Online:
Multi-Objective Optimization
  • 1434 Accesses

Abstract

Gene selection from microarray gene expression datasets and clustering of samples into different groups are important data mining tasks for disease identification. Selection of more interpretable genes from the gene expression dataset is an essential data-preprocessing task, which helps to study on cancer diseases. Gene selection during sample clustering is inherently a difficult task as there is no obvious criterion to guide the search. Simultaneous gene selection and sample clustering is a two-way data analysis technique which has recently gained attention in research area. The traditional clustering techniques are unable to handle noisy data properly. So, effective clustering algorithms are more desirable which can deal with the relevant and noise free data. Therefore, target genes selection before sample clustering is essential and of course effective if both the tasks are done simultaneously. In this chapter, optimal gene subset is selected and sample clustering is performed simultaneously using Multi-Objective Genetic Algorithm (MOGA). Different versions of MOGA are employed to choose the optimal gene subset, where natural number of optimal clusters of samples is automatically obtained at the end of the process. Non-dominated sorting genetic algorithm (NSGA), Strength pareto evolutionary algorithm (SPEA) and its modified version SPEA2 are applied for the purpose. The methods use nonlinear hybrid uniform cellular automata for generating initial population, tournament selection strategy, two-point crossover operation, and a suitable jumping gene mutation mechanism to maintain diversity in the population. It uses mutual correlation coefficient; internal and external cluster validation indices as objective functions to find out the non-dominated solutions. To measure the cluster validation indices, clustering algorithm is applied on data subset associated to chromosomes in the population to find out different clusters. After the convergence of genetic algorithm, the best solution from the non-dominated solutions is identified that provides the important genes and categorizes the samples into clusters. The experimental results express the correctness of the proposed simultaneous gene selection and sample categorization method. The goodness of optimality of the clusters obtained using different genetic algorithms is expressed by comparing various cluster validation indices.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  • C.J. Alonso-Gonzalez, Q.I. Moro-Sancho, A. Simon-Hurtado, R. Varela-Arrabal, Microarray gene expression classification with few genes: criteria to combine attribute selection and classification methods. Expert Syst. Appl. 39(8), 7270–7280 (2012)

    Article  Google Scholar 

  • S. Akogul, M. Erisoglu, An approach for determining the number of clusters in a model-based cluster analysis. Entropy 19(452), 1–15 (2017)

    MathSciNet  Google Scholar 

  • A. Baraldi, P. Blonda, A Survey of fuzzy clustering algorithms for pattern recognition—part I and II. IEEE Trans. Syst. Man Cybern. B, Cybern. 29(6), 778–801 (1999)

    Article  Google Scholar 

  • A. Bellaachia, D. Portno, Y. Chen, A.G. Elkahloun, E-CAST: a data mining algorithm for gene expression data. J. Comput. Biol. 7, 559–584 (2000)

    Article  Google Scholar 

  • A. Ben-Dor, R. Shamir, Z. Yakhini, Clustering gene expression patterns. J. Comput. Biol. 6(3–4), 281–297 (1999)

    Article  Google Scholar 

  • A. Bhat, K-Medoids clustering using partitioning around mediods performing face recognition. Int. J. Soft Comput. Math. Control (IJSCMC) 3(3), 1–12 (2014)

    Article  MathSciNet  Google Scholar 

  • D.N. Campo, G. Stegmayer, D.H. Milone, A new index for clustering validation with overlapped clusters. Expert Syst. Appl. 64, 549–556 (2016)

    Article  Google Scholar 

  • R.B. Calinski, J. Harabasz, A dendrite method for cluster analysis. Commun. Stat. 3, 1–27 (1974)

    MathSciNet  MATH  Google Scholar 

  • D.L. Davies, D.W. Bouldin, A cluster separation measure. IEEE Trans. Pattern Recogn. Mach. Intell. 1(2), 224–227 (1979)

    Article  Google Scholar 

  • K. Deb, Multi-Objective Optimization Using Evolutionary Algorithms. Wiley, vol. 16 (2001)

    Google Scholar 

  • K. Deb, D. Goldberg, An investigation of niche and spices formation in genetic function optimization, in Proceedings of the Third International Conference on Genetic Algorithms (1989), pp. 42–50

    Google Scholar 

  • K. Deb, Genetic Algorithm in Multi-Modal Function Optimization, Master’s Thesis, Tuscaloosa, University of Alabama (1989)

    Google Scholar 

  • K. Deb, A. Pratap, S. Agarwal, T. Meyarivan, A fast elitist multi-objective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)

    Google Scholar 

  • C.M. Fonseca, P.J. Fleming, Genetic algorithms for multi-objective optimization: formulation, discussion and generalization, in Proceedings of the Fifth International Conference on Genetic Algorithms, ed. by S. Forrest (Morgan Kauffman, San Mateo, CA, 1993), pp. 416–423

    Google Scholar 

  • D. Gong, G. Wang, X. Sun, Y. Han, A set-based genetic algorithm for solving the many-objective optimization problem. Soft Comput. 19(6), 1477–1495 (2015)

    Article  Google Scholar 

  • K.C. Gowda, G. Krishna, Agglomerative clustering using the concept of mutual nearest neighborhood. Pattern Recogn. 10, 105–112 (1978)

    Article  Google Scholar 

  • F. Gu, H.L. Liu, K.C. Tan, A hybrid evolutionary multi-objective optimization algorithm with adaptive multi-fitness assignment. Soft Comput. 19(11), 3249–3259 (2015)

    Article  Google Scholar 

  • J. Horn, N. Nafploitis, D.E. Goldberg, A niched Pareto genetic algorithm for multi-objective optimization, in Proceedings of the First IEEE Conference on Evolutionary Computation, ed. by Z. Michalewicz (IEEE Press, Piscataway, NJ, 1994), pp. 82–87

    Google Scholar 

  • Z. Huang, M.K. Ng, A fuzzy k-Modes algorithm for clustering categorical data. IEEE Trans. Fuzzy Syst. 7(4), 446–452 (1999)

    Google Scholar 

  • R. Kerber, ChiMerge: discretization of numeric attributes, in Tenth National Conference on Artificial Intelligence (1992), pp. 123–128

    Google Scholar 

  • H. Liu, B. Dai, H. He, Y. Yan, The k-prototype algorithm of clustering high dimensional and large scale mixed data, in Proceedings of the International computer Conference, China (2006), pp. 738–743

    Google Scholar 

  • H. Maaranen, K. Miettinen, M.M. Makela, A quasi-random initial population for genetic algorithms, in Computers and Mathematics with Applications, vol. 47(12) (Elsevier, 2004), pp. 1885–1895

    Google Scholar 

  • U. Maulik, S. Bandyopadhyay, Performance evaluation of some clustering algorithms and validity indices. IEEE Trans. Pattern Anal. 24(12), 1650–1654 (2002)

    Article  Google Scholar 

  • P. Merz, An Iterated Local Search Approach for Minimum Sum of Squares Clustering. IDA 2003 (2003), pp. 286–296

    Google Scholar 

  • P.A. Mundra, J.C. Rajapakse, Gene and sample selection for cancer classification with support vectors based t-statistic. Neurocomputing 73(13–15), 2353–2362 (2010)

    Article  Google Scholar 

  • R.T. Nag, J. Han, CLARANS: a method for clustering objects for spatial data mining. IEEE Trans. Knowl. Data Eng. 14(5), 1003–1016 (2002)

    Article  Google Scholar 

  • S.K. Pati, A.K. Das, A. Ghosh, Gene selection using multi-objective genetic algorithm integrating cellular automata and rough set theory in Swarm, Evolutionary, and Memetic Computing (2013), pp. 144–155

    Google Scholar 

  • W. Pedrycz, K. Hirota, Fuzzy vector quantization with the particle swarm optimization: a study in fuzzy granulation-degranulation information processing. Signal Process. 87(9), 2061–2071 (2007)

    Article  Google Scholar 

  • M.I. Petrovskiy, Outlier detection algorithms in data mining systems. Program. Comput. Softw. 29(4), 228–237 (2003)

    Article  Google Scholar 

  • K. Price, R.M. Storn, J.A. Lampinen, Differential Evolution: A Practical Approach to Global Optimization. Natural Computing Series (Springer, 2005). ISBN: 3540209506

    Google Scholar 

  • P.J. Rousseeuw, Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)

    Article  Google Scholar 

  • J.D. Schaffer, Multiple objective optimization with vector evaluated genetic algorithms, in Proceedings of the First International Conference on Genetic Algorithms ed. by J.J. Grefensttete (Lawrence Erlbaum, Hillsdale, NJ, 1987), pp. 93–100

    Google Scholar 

  • N. Srinivas, K. Deb, Multi-objective function optimization using non dominated sorting genetic algorithms. Evol. Comput. 2(3), 221–248 (1995)

    Article  Google Scholar 

  • M. Steinbach, G. Karypis, V. Kumar, A Comparison of document clustering technique, Technical Report number 00 - 034, University of Minnesota, Minneapolis (2000)

    Google Scholar 

  • I.V. Tetko, D.J. Livingstone, A.I. Luik, Neural network studies. 1. Comparison of overfitting and overtraining. J. Chem. Inf. Comput. Sci. 35, 826–833 (1995)

    Article  Google Scholar 

  • D.P. Waters, Von Neumann’s theory of self-reproducing automata: a useful framework for biosemiotics? Biosemiotics 5(1), 5–15 (2012)

    Article  Google Scholar 

  • X.L. Xie, G. Beni, A validity measure for fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell. 13(4), 841–846 (1991)

    Article  Google Scholar 

  • E. Zitzler, M. Laumanns, L. Thiele, SPEA2: improving the strength pareto evolutionary algorithm for multiobjective optimization, in Evolutionary Methods for Design, Optimisation, and Control (2002), pp. 95–100

    Google Scholar 

  • E. Zitzler, L. Thiele, Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach. IEEE Trans. Evol. Comput. 3(4), 257–271 (1999)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sunanda Das .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Das, A.K., Das, S. (2018). A Comparative Study on Different Versions of Multi-Objective Genetic Algorithm for Simultaneous Gene Selection and Sample Categorization. In: Mandal, J., Mukhopadhyay, S., Dutta, P. (eds) Multi-Objective Optimization. Springer, Singapore. https://doi.org/10.1007/978-981-13-1471-1_11

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-1471-1_11

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-1470-4

  • Online ISBN: 978-981-13-1471-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics