A Comparative Study on Different Versions of Multi-Objective Genetic Algorithm for Simultaneous Gene Selection and Sample Categorization

Das, Asit Kumar; Das, Sunanda

doi:10.1007/978-981-13-1471-1_11

Asit Kumar Das⁴ &
Sunanda Das⁵

1434 Accesses

Abstract

Gene selection from microarray gene expression datasets and clustering of samples into different groups are important data mining tasks for disease identification. Selection of more interpretable genes from the gene expression dataset is an essential data-preprocessing task, which helps to study on cancer diseases. Gene selection during sample clustering is inherently a difficult task as there is no obvious criterion to guide the search. Simultaneous gene selection and sample clustering is a two-way data analysis technique which has recently gained attention in research area. The traditional clustering techniques are unable to handle noisy data properly. So, effective clustering algorithms are more desirable which can deal with the relevant and noise free data. Therefore, target genes selection before sample clustering is essential and of course effective if both the tasks are done simultaneously. In this chapter, optimal gene subset is selected and sample clustering is performed simultaneously using Multi-Objective Genetic Algorithm (MOGA). Different versions of MOGA are employed to choose the optimal gene subset, where natural number of optimal clusters of samples is automatically obtained at the end of the process. Non-dominated sorting genetic algorithm (NSGA), Strength pareto evolutionary algorithm (SPEA) and its modified version SPEA2 are applied for the purpose. The methods use nonlinear hybrid uniform cellular automata for generating initial population, tournament selection strategy, two-point crossover operation, and a suitable jumping gene mutation mechanism to maintain diversity in the population. It uses mutual correlation coefficient; internal and external cluster validation indices as objective functions to find out the non-dominated solutions. To measure the cluster validation indices, clustering algorithm is applied on data subset associated to chromosomes in the population to find out different clusters. After the convergence of genetic algorithm, the best solution from the non-dominated solutions is identified that provides the important genes and categorizes the samples into clusters. The experimental results express the correctness of the proposed simultaneous gene selection and sample categorization method. The goodness of optimality of the clusters obtained using different genetic algorithms is expressed by comparing various cluster validation indices.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Unsupervised gene selection using biological knowledge : application in sample clustering

Article Open access 22 November 2017

A multi-objective gene clustering algorithm guided by apriori biological knowledge with intensification and diversification strategies

Article Open access 07 August 2018

A novel gene selection algorithm for cancer classification using microarray datasets

Article Open access 15 January 2019

References

C.J. Alonso-Gonzalez, Q.I. Moro-Sancho, A. Simon-Hurtado, R. Varela-Arrabal, Microarray gene expression classification with few genes: criteria to combine attribute selection and classification methods. Expert Syst. Appl. 39(8), 7270–7280 (2012)
Article Google Scholar
S. Akogul, M. Erisoglu, An approach for determining the number of clusters in a model-based cluster analysis. Entropy 19(452), 1–15 (2017)
MathSciNet Google Scholar
A. Baraldi, P. Blonda, A Survey of fuzzy clustering algorithms for pattern recognition—part I and II. IEEE Trans. Syst. Man Cybern. B, Cybern. 29(6), 778–801 (1999)
Article Google Scholar
A. Bellaachia, D. Portno, Y. Chen, A.G. Elkahloun, E-CAST: a data mining algorithm for gene expression data. J. Comput. Biol. 7, 559–584 (2000)
Article Google Scholar
A. Ben-Dor, R. Shamir, Z. Yakhini, Clustering gene expression patterns. J. Comput. Biol. 6(3–4), 281–297 (1999)
Article Google Scholar
A. Bhat, K-Medoids clustering using partitioning around mediods performing face recognition. Int. J. Soft Comput. Math. Control (IJSCMC) 3(3), 1–12 (2014)
Article MathSciNet Google Scholar
D.N. Campo, G. Stegmayer, D.H. Milone, A new index for clustering validation with overlapped clusters. Expert Syst. Appl. 64, 549–556 (2016)
Article Google Scholar
R.B. Calinski, J. Harabasz, A dendrite method for cluster analysis. Commun. Stat. 3, 1–27 (1974)
MathSciNet MATH Google Scholar
D.L. Davies, D.W. Bouldin, A cluster separation measure. IEEE Trans. Pattern Recogn. Mach. Intell. 1(2), 224–227 (1979)
Article Google Scholar
K. Deb, Multi-Objective Optimization Using Evolutionary Algorithms. Wiley, vol. 16 (2001)
Google Scholar
K. Deb, D. Goldberg, An investigation of niche and spices formation in genetic function optimization, in Proceedings of the Third International Conference on Genetic Algorithms (1989), pp. 42–50
Google Scholar
K. Deb, Genetic Algorithm in Multi-Modal Function Optimization, Master’s Thesis, Tuscaloosa, University of Alabama (1989)
Google Scholar
K. Deb, A. Pratap, S. Agarwal, T. Meyarivan, A fast elitist multi-objective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)
Google Scholar
C.M. Fonseca, P.J. Fleming, Genetic algorithms for multi-objective optimization: formulation, discussion and generalization, in Proceedings of the Fifth International Conference on Genetic Algorithms, ed. by S. Forrest (Morgan Kauffman, San Mateo, CA, 1993), pp. 416–423
Google Scholar
D. Gong, G. Wang, X. Sun, Y. Han, A set-based genetic algorithm for solving the many-objective optimization problem. Soft Comput. 19(6), 1477–1495 (2015)
Article Google Scholar
K.C. Gowda, G. Krishna, Agglomerative clustering using the concept of mutual nearest neighborhood. Pattern Recogn. 10, 105–112 (1978)
Article Google Scholar
F. Gu, H.L. Liu, K.C. Tan, A hybrid evolutionary multi-objective optimization algorithm with adaptive multi-fitness assignment. Soft Comput. 19(11), 3249–3259 (2015)
Article Google Scholar
J. Horn, N. Nafploitis, D.E. Goldberg, A niched Pareto genetic algorithm for multi-objective optimization, in Proceedings of the First IEEE Conference on Evolutionary Computation, ed. by Z. Michalewicz (IEEE Press, Piscataway, NJ, 1994), pp. 82–87
Google Scholar
Z. Huang, M.K. Ng, A fuzzy k-Modes algorithm for clustering categorical data. IEEE Trans. Fuzzy Syst. 7(4), 446–452 (1999)
Google Scholar
R. Kerber, ChiMerge: discretization of numeric attributes, in Tenth National Conference on Artificial Intelligence (1992), pp. 123–128
Google Scholar
H. Liu, B. Dai, H. He, Y. Yan, The k-prototype algorithm of clustering high dimensional and large scale mixed data, in Proceedings of the International computer Conference, China (2006), pp. 738–743
Google Scholar
H. Maaranen, K. Miettinen, M.M. Makela, A quasi-random initial population for genetic algorithms, in Computers and Mathematics with Applications, vol. 47(12) (Elsevier, 2004), pp. 1885–1895
Google Scholar
U. Maulik, S. Bandyopadhyay, Performance evaluation of some clustering algorithms and validity indices. IEEE Trans. Pattern Anal. 24(12), 1650–1654 (2002)
Article Google Scholar
P. Merz, An Iterated Local Search Approach for Minimum Sum of Squares Clustering. IDA 2003 (2003), pp. 286–296
Google Scholar
P.A. Mundra, J.C. Rajapakse, Gene and sample selection for cancer classification with support vectors based t-statistic. Neurocomputing 73(13–15), 2353–2362 (2010)
Article Google Scholar
R.T. Nag, J. Han, CLARANS: a method for clustering objects for spatial data mining. IEEE Trans. Knowl. Data Eng. 14(5), 1003–1016 (2002)
Article Google Scholar
S.K. Pati, A.K. Das, A. Ghosh, Gene selection using multi-objective genetic algorithm integrating cellular automata and rough set theory in Swarm, Evolutionary, and Memetic Computing (2013), pp. 144–155
Google Scholar
W. Pedrycz, K. Hirota, Fuzzy vector quantization with the particle swarm optimization: a study in fuzzy granulation-degranulation information processing. Signal Process. 87(9), 2061–2071 (2007)
Article Google Scholar
M.I. Petrovskiy, Outlier detection algorithms in data mining systems. Program. Comput. Softw. 29(4), 228–237 (2003)
Article Google Scholar
K. Price, R.M. Storn, J.A. Lampinen, Differential Evolution: A Practical Approach to Global Optimization. Natural Computing Series (Springer, 2005). ISBN: 3540209506
Google Scholar
P.J. Rousseeuw, Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Article Google Scholar
J.D. Schaffer, Multiple objective optimization with vector evaluated genetic algorithms, in Proceedings of the First International Conference on Genetic Algorithms ed. by J.J. Grefensttete (Lawrence Erlbaum, Hillsdale, NJ, 1987), pp. 93–100
Google Scholar
N. Srinivas, K. Deb, Multi-objective function optimization using non dominated sorting genetic algorithms. Evol. Comput. 2(3), 221–248 (1995)
Article Google Scholar
M. Steinbach, G. Karypis, V. Kumar, A Comparison of document clustering technique, Technical Report number 00 - 034, University of Minnesota, Minneapolis (2000)
Google Scholar
I.V. Tetko, D.J. Livingstone, A.I. Luik, Neural network studies. 1. Comparison of overfitting and overtraining. J. Chem. Inf. Comput. Sci. 35, 826–833 (1995)
Article Google Scholar
D.P. Waters, Von Neumann’s theory of self-reproducing automata: a useful framework for biosemiotics? Biosemiotics 5(1), 5–15 (2012)
Article Google Scholar
X.L. Xie, G. Beni, A validity measure for fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell. 13(4), 841–846 (1991)
Article Google Scholar
E. Zitzler, M. Laumanns, L. Thiele, SPEA2: improving the strength pareto evolutionary algorithm for multiobjective optimization, in Evolutionary Methods for Design, Optimisation, and Control (2002), pp. 95–100
Google Scholar
E. Zitzler, L. Thiele, Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach. IEEE Trans. Evol. Comput. 3(4), 257–271 (1999)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Indian Institute of Engineering, Science & Technology, Shibpur, Howrah, West Bengal, India
Asit Kumar Das
Department of Computer Science and Engineering, The Neotia University, South 24 Paragana, West Bengal, India
Sunanda Das

Authors

Asit Kumar Das
View author publications
You can also search for this author in PubMed Google Scholar
Sunanda Das
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sunanda Das .

Editor information

Editors and Affiliations

University of Kalyani, Kalyani, West Bengal, India
Jyotsna K. Mandal
Assam University, Silchar, Assam, India
Somnath Mukhopadhyay
Visva Bharati University, Bolpur, Santiniketan, West Bengal, India
Paramartha Dutta

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Das, A.K., Das, S. (2018). A Comparative Study on Different Versions of Multi-Objective Genetic Algorithm for Simultaneous Gene Selection and Sample Categorization. In: Mandal, J., Mukhopadhyay, S., Dutta, P. (eds) Multi-Objective Optimization. Springer, Singapore. https://doi.org/10.1007/978-981-13-1471-1_11

Download citation

DOI: https://doi.org/10.1007/978-981-13-1471-1_11
Published: 19 August 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1470-4
Online ISBN: 978-981-13-1471-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Comparative Study on Different Versions of Multi-Objective Genetic Algorithm for Simultaneous Gene Selection and Sample Categorization

Abstract

Access this chapter

Similar content being viewed by others

Unsupervised gene selection using biological knowledge : application in sample clustering

A multi-objective gene clustering algorithm guided by apriori biological knowledge with intensification and diversification strategies

A novel gene selection algorithm for cancer classification using microarray datasets

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

A Comparative Study on Different Versions of Multi-Objective Genetic Algorithm for Simultaneous Gene Selection and Sample Categorization

Abstract

Access this chapter

Similar content being viewed by others

Unsupervised gene selection using biological knowledge : application in sample clustering

A multi-objective gene clustering algorithm guided by apriori biological knowledge with intensification and diversification strategies

A novel gene selection algorithm for cancer classification using microarray datasets

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation