Skip to main content

Bi-objective Genetic Algorithm with Rough Set Theory for Important Gene Selection in Disease Diagnosis

  • Chapter
  • First Online:
Book cover Multi-Objective Optimization

Abstract

Gene selection is a general phenomenon in the subject of bioinformatics where data mining and knowledge innovation plays a significant role in selecting an optimal set of genes regarding some useful evaluation functions. Gene selection based on single objective genetic algorithm may not provide the best solution due to varied characteristics of the datasets. If multiple objective functions are combined, an algorithm generally provides more important genes compared to the algorithm relying on a single criterion. Here, two criteria are united and a novel bi-objective genetic algorithm for gene selection is proposed, which effectively reduces the dimensionality of the huge volume gene dataset without sacrificing any meaningful information. The method uses nonlinear hybrid cellular automata for creating initial population and a novel jumping gene technique for mutation to maintain diversity in chromosomes of the population. It explores rough set theory and Kullback–Leibler divergence technique to define two fitness functions, which are conflicting in nature and are employed to approximate a Pareto-optimal solution sets. The best solutions of the proposed method provide the informative genes used for disease diagnosis. The replacement strategy for the creation of next generation population is based on the Pareto-optimal solution regarding both the fitness functions. The experimental results on the publicly obtainable microarray data express the importance of the identified genes and the effectiveness of the proposed informative gene selection mechanism.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • U. Alon, N. Barkai, D.A. Notterman, K. Gish, S. Ybarra, D. Mack, A.J. Levine, Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. PNAS 96, 6745–6750 (1999)

    Article  Google Scholar 

  • H.C. Causton, J. Quackenbush, A. Brazma, Microarray gene expression data analysis: a beginner’s guide. Genet. Res. 82, 151–153 (2003)

    Article  Google Scholar 

  • G. Chaconas, B.D. Lavoie, M.A. Watson, DNA transposition: jumping gene machine. Curr. Biol. 6(7), 817–820 (1996)

    Article  Google Scholar 

  • K. Deb, A. Pratap, S. Agarwal, T.A. Meyarivan, A fast and elitist multi objective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comp. 6(2), 182–197 (2002)

    Article  Google Scholar 

  • T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomfield, E.S. Lander, Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)

    Article  Google Scholar 

  • D.E. Goldberg, J.H. Holland, Genetic algorithms and machine learning. Mach. Learn. 3(2), 95–99 (1988)

    Article  Google Scholar 

  • D. Gong, G. Wang, X. Sun, Y. Han, A set-based genetic algorithm for solving the many-objective optimization problem. Soft Comput. 19(6), 1477–1495 (2015)

    Article  Google Scholar 

  • G.J. Gordon, R.V. Jensen, L.L. Hsiao, S.R. Gullans, J.E. Blumenstock, S. Ramaswamy, W.G. Richards, D.J. Sugarbaker, R. Bueno, Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res. 62, 4963–4967 (2002)

    Google Scholar 

  • F. Gu, H.L. Liu, K.C. Tan, A hybrid evolutionary multi-objective optimization algorithm with adaptive multi-fitness assignment. Soft Comput. 19(11), 3249–3259 (2015)

    Article  Google Scholar 

  • A.M. Hall, Correlation-based feature selection for machine learning, The University of Waikato, 1999

    Google Scholar 

  • J. Harmouche, C. Delpha, D. Diallo, Y.L. Bihan, Statistical approach for non-destructive incipient crack detection and characterization using Kullback-Leibler divergence. IEEE Trans. Reliab. 65(3), 1360–1368 (2016)

    Article  Google Scholar 

  • J.E. Jackson, A User’s Guide to Principal Components (Wiley, New York, 1991), ISBN 0-471-62267-2

    Google Scholar 

  • S.Y. Jing, A hybrid genetic algorithm for feature subset selection in rough set theory. Soft Comput. 18(7), 1373–1382 (2014)

    Article  Google Scholar 

  • Kent Ridge Biomedical Dataset Repository, (n.d), http://datam.i2r.a-star.edu.sg/datasets/krbd/

  • R. Kerber, ChiMerge: discretization of numeric attributes. in National Conference on Artificial Intelligence, pp. 123–128 (1992)

    Google Scholar 

  • J.D. Knowles, D.W. Corne, M-PAES: a memetic algorithm for multi-objective optimization. in Proceedings of IEEE Congress on Evolutionary Computation, pp. 325–332 (2000)

    Google Scholar 

  • S. Kullback, R.A. Leibler, On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)

    Article  MathSciNet  Google Scholar 

  • Y. Leung, Y. Hung, A multiple-filter-multiple-wrapper approach to gene selection and microarray data classification. IEEE/ACM Trans. Comput. Biol. Bioinform. 7(1), 108–117 (2010)

    Google Scholar 

  • H. Maaranen, K. Miettinen, M.M. Makela, A quasi-random initial population for genetic algorithms. Comput. Math. Appl. 47(12), 1885–1895 (2004), Elsevier

    Google Scholar 

  • J.V. Neumann, in Theory of Self-reproducing Automata, ed. by A.W. Burks (Univer. of Illinois Press, USA, 1996)

    Google Scholar 

  • Z. Pawlak, Rough set theory and its applications to data analysis. Cybern. Syst. 29, 661–688 (1998)

    Article  Google Scholar 

  • M. Petrou, P. Bosdogianni, An example of SVD. in Image Processing: The Fundamentals (Wiley, 2000), pp. 37–44

    Google Scholar 

  • K. Price, R.M. Storn, J.A. Lampinen, in Differential Evolution: A Practical Approach to Global Optimization, Natural Computing Series (Springer, 2005), ISBN: 3540209506

    Google Scholar 

  • L.S. Santana, A.M. Canuto, Filter-based optimization techniques for selection of feature subsets in ensemble systems. Expert Syst. Appl. 41(4), 1622–1631 (2014)

    Article  Google Scholar 

  • G. Schaefer, Data mining of gene expression data by fuzzy and hybrid fuzzy methods. IEEE Trans. Inf. Technol. Biomed. 14(1), 23–29 (2010)

    Google Scholar 

  • P. Shelokar, A. Quirin, O. Cordón, MOSubdue: a Pareto dominance-based multi objective Subdue algorithm for frequent sub graph mining. Knowl. Inf. Syst. 34(1), 75–108 (2013)

    Article  Google Scholar 

  • M.A. Shipp, K.N. Ross, P. Tamayo, A.P. Weng, J.L. Kutok, R.C.T. Aguiar, M. Gaasenbeek, M. Angelo, M. Reich, T.R. Golub, Diffuse large B-cell lymphoma outcome prediction by gene expression profiling and supervised machine learning. Natl. Med. 8(1), 68–74 (2002)

    Article  Google Scholar 

  • D. Singh, P.G. Febbo, K. Ross, D.G. Jackson, J. Manola, C. Ladd, P. Tamayo, A.A. Renshaw, J.P. Richie, E.S. Lander, M. Loda, T.R. Golub, W.R. Sellers, Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1, 203–209 (2002)

    Article  Google Scholar 

  • L.J. Veer, H. Dai, M.J. Vijver, Y.D. He, Y.D. He, A.A.M. Hart, Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536 (2002)

    Article  Google Scholar 

  • D.P. Waters, Von Neumann’s theory of self-reproducing automata: a useful framework for biosemiotics? Biosemiotics 5(1), 5–15 (2012)

    Article  Google Scholar 

  • Y. Yang, J.O. Pedersen, A comparative study on feature selection in text categorization. ICML 97, 412–420 (1997)

    Google Scholar 

  • Q. Zhang, H. Li, MOEA/D: a multi-objective evolutionary algorithm based on decomposition. IEEE Trans. Evol. Comput. 11(6), 712–731 (2007)

    Article  Google Scholar 

  • E. Zitzler, L. Thiele, Multi-objective evolutionary algorithms: a comparative case study and the strength Pareto approach. IEEE Trans. Evol. Comput. 3(4), 257–271 (1999)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Asit Kumar Das or Soumen Kumar Pati .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Das, A.K., Pati, S.K. (2018). Bi-objective Genetic Algorithm with Rough Set Theory for Important Gene Selection in Disease Diagnosis. In: Mandal, J., Mukhopadhyay, S., Dutta, P. (eds) Multi-Objective Optimization. Springer, Singapore. https://doi.org/10.1007/978-981-13-1471-1_13

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-1471-1_13

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-1470-4

  • Online ISBN: 978-981-13-1471-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics