Skip to main content

Part of the book series: Springer Handbooks ((SHB))

  • 10k Accesses

Abstract

Biomedical research progresses rapidly, in particular in the area of genomic and postgenomic research. Hence many challenges appear for biostatistics and bioinformatics to deal with the large amount of data generated. After presenting some of these challenges, this chapter aims at presenting evolutionary combinatorial optimization approaches proposed to deal with knowledge discovery in bioinformatics. Therefore, the chapter will focus on three main tasks of data mining (association rules, feature selection, and clustering) widely encountered in bioinformatics applications. For each of them, a description of the task will be given as well as information about their uses in bioinformatics. Then, some evolutionary approaches proposed to cope with such a task will be exposed and discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 269.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 349.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Abbreviations

3-D:

three-dimensional

AIC:

Akaike information criterion

AUC:

area under ROC curve

area under curve

BIC:

Bayesian information criterion

BioHEL:

bioinformatics-oriented hierarchical evolutionary learning

CA:

classification accuracy

CFS:

correlation feature selection

DIC:

deviance information criterion

DNA:

deoxyribonucleic acid

EA:

evolutionary algorithm

ELSA:

evolutionary local selection algorithm

GA:

genetic algorithm

GGP:

grammar-based genetic programming

GP:

genetic algorithm

GWAS:

genome-wide association studies

KNN:

k nearest neighbor

LCS:

learning classifier system

LOOCV:

leave-one-out cross-validation

LS:

local search

MDL:

minimum description length

MLR:

multiple linear regression

MOEA:

multiobjective evolutionary algorithm

mRMR:

minimal-redundancy-maximal-relevance

NN:

neural network

PLS:

partial least square

RMSEP:

root-mean-square error of prediction

ROC:

receiver operating characteristic

RWS:

roulette wheel selection

SBS:

sequential backward selection

SFS:

sequential forward selection

SNP:

single nucleotide polymorphism

SSOCF:

subset size-oriented common features

SUS:

stochastic universal sampling

SVM:

support vector machine

VCR:

variance ratio criterion

XB:

Xie-Beni cluster validity index

References

  1. D. Corne, C. Dhaenens, L. Jourdan: Synergies between operations research and data mining: The emerging use of multi-objective approaches, Eur. J. Oper. Res. 221(3), 469–479 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  2. F. Valafar: Pattern recognition techniques in microarray data analysis, Ann. N. Y. Acad. Sci. 980(1), 41–64 (2002)

    Article  Google Scholar 

  3. J.H. Moore, F.W. Asselbergs, S.M. Williams: Bioinformatics challenges for genome-wide association studies, Bioinformatics 26(4), 445–455 (2010)

    Article  Google Scholar 

  4. R. Agrawal, T. Imielinski, A.N. Swami: Mining association rules between sets of items in large databases, Proc. 1993 ACM SIGMOD Int. Conf. Manag. Data (ACM, New York 1993) pp. 207–216

    Chapter  Google Scholar 

  5. R. Agrawal, R. Srikant: Fast algorithms for mining association rules in large databases, VLDB '94: Proc. 20th Int. Conf. Very Large Data Bases (Morgan Kaufmann, 1994) pp. 487–499

    Google Scholar 

  6. C. Borgelt: Efficient implementations of a priori and eclat, Proc. 1st IEEE ICDM Workshop Freq. Item Set Min. Implement. (FIMI 2003) (2003), p. 90

    Google Scholar 

  7. Y. Ye, C.-C. Chiang: A parallel apriori algorithm for frequent itemsets mining, Proc. 4th Int. Conf. Softw. Eng. Res. Manag. Appl. (2006) pp. 87–94

    Google Scholar 

  8. M.J. Zaki: Parallel sequence mining on shared-memory machines, J. Parallel Distrib. Comput. 61(3), 401–426 (2001)

    Article  MATH  Google Scholar 

  9. G. Atluri, R. Gupta, G. Fang, G. Pandey, M. Steinbach, V. Kumar: Association analysis techniques for bioinformatics problems, Proc. 1st Int. Conf. Bioinform. Comput. Biol. (BICoB '09) (Springer, Berlin, Heidelberg 2009) pp. 1–13

    Chapter  Google Scholar 

  10. P. Carmona-Saez, M. Chagoyen, A. Rodriguez, O. Trelles, J. Carazo, A. Pascual-Montano: Integrated analysis of gene expression by association rules discovery, BMC Bioinformatics 7(1), 54 (2006)

    Article  Google Scholar 

  11. M. Khabzaoui, C. Dhaenens, E.-G. Talbi: A multicriteria genetic algorithm to analyze microarray data, Evol. Comput., CEC2004. Congr., Vol. 2 (2004) pp. 1874–1881

    Google Scholar 

  12. L. Jourdan, M. Khabzaoui, C. Dhaenens, E.-G. Talbi: A hybrid evolutionary algorithm for knowledge discovery in microarray experiments. In: Handbook of Bioinspired Algorithms and Applications, ed. by S. Olariu, A.Y. Zomaya (CRC, London 2005) pp. 491–508

    Google Scholar 

  13. P. Lanzi: Learning classifier systems: Then and now, Evol. Intell. 1, 63–82 (2008)

    Article  Google Scholar 

  14. M. Stout, J. Bacardit, J.D. Hirst, R.E. Smith, N. Krasnogor: Prediction of topological contacts in proteins using learning classifier systems, Soft Comput. J. 13(3), 245–258 (2009)

    Article  Google Scholar 

  15. J. Bacardit, E.K. Burke, N. Krasnogor: Improving the scalability of rule-based evolutionary learning, Memet. Comput. 1(1), 55–67 (2008)

    Article  Google Scholar 

  16. R. Slowinski, S. Greco, B. Matarazzo: Rough sets in decision making. In: Encyclopedia of Complexity and Systems Science, ed. by R.A. Meyers (Springer, New York 2009) pp. 7753–7787

    Chapter  Google Scholar 

  17. J. Komorowski, A. Øhrn, A. Skowron: The ROSETTA Rough Set Software System (Oxford Univ. Press, New York 2002), Chap. D.2.3.

    MATH  Google Scholar 

  18. H. Strömbergsson, P. Prusis, H. Midelfart, M. Lapinsh, J.E.S. Wikberg, J. Komorowski: Rough set-based proteochemometrics modeling of G–protein–coupled receptor-ligand interactions, Proteins: Struct. Funct. Bioinform. 63(1), 24–34 (2006)

    Article  Google Scholar 

  19. S. Vinterbo, A. Øhrn: Minimal approximate hitting sets and rule templates, Int. J. Approx. Reason. 25(2), 123–143 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  20. T. Fawcett: An introduction to ROC analysis, Pattern Recognit. Lett. 27(8), 861–874 (2006)

    Article  MathSciNet  Google Scholar 

  21. Y.J. Cho, H. Kim, H.-B. Oh: Generating rules for predicting MHC class I binding peptide using ANN and knowledge-based GA, JDCTA: Int. J. Dig. Content Technol. Appl. 3, 111–119 (2009)

    Article  Google Scholar 

  22. G.L. Pappa, A.A. Freitas: Automatically evolving rule induction algorithms tailored to the prediction of postsynaptic activity in proteins, Intell. Data Anal. 13, 243–259 (2009)

    Google Scholar 

  23. Z.R. Yang, G. Lertmemongkolchai, G. Tan, P.L. Felgner, R.W. Titball: A genetic programming approach for Burkholderia pseudomallei diagnostic pattern discovery, Bioinformatics 25(17), 2256–2262 (2009)

    Article  Google Scholar 

  24. X. Llorá, R. Reddy, B. Matesic, R. Bhargava: Towards better than human capability in diagnosing prostate cancer using infrared spectroscopic imaging, GECCO '07 Proc. 9th Annu. Conf. Genet. Evol. Comput. (2007)

    Google Scholar 

  25. A. Laegreid, T.R. Hvidsten, H. Midelfart, J. Komorowski, A.K. Sandvik: Predicting gene ontology biological process from temporal gene expression patterns, Genome Res. 13(5), 965–979 (2003)

    Article  Google Scholar 

  26. J. Bacardit, M.V. Butz: Data mining in learning classifier systems: Comparing XCS with GAssist. IWLCS 2003–2005, Lect. Notes Artif. Intell. 4399, 282–290 (2007)

    Google Scholar 

  27. L. Geng, H.J. Hamilton: Interestingness measures for data mining: A survey, ACM Comput. Surv. (CSUR) 38(3), 9 (2006)

    Article  Google Scholar 

  28. J. Bacardit: Pittsburgh Genetic-Based Machine Learning in the Data Mining Era: Representations, Generalization, and Run-Time, Ph.D. Thesis (Universitat Ramon Llull, Barcelona 2004)

    Google Scholar 

  29. J. Casillas, P. Martínez, A. Benítez: Learning consistent, complete and compact sets of fuzzy rules in conjunctive normal form for regression problems, Soft Comput. Fus. Found. Methodol. Appl. 13, 451–465 (2009)

    Google Scholar 

  30. Y.S. Kim, W.M. Street, F. Menczer: Feature selection in data mining. In: Data Mining: Opportunities and Challenges, ed. by J. Wang (Idea Group, Hershey 2002) pp. 80–105

    Google Scholar 

  31. J. García-Nieto, E. Alba, L. Jourdan, E.-G. Talbi: Sensitivity and specificity based multiobjective approach for feature selection: Application to cancer diagnosis, Inf. Process. Lett. 109, 887–896 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  32. Y. Saeys, I. Inza, P. Larraaga: A review of feature selection techniques in bioinformatics, Bioinformatics 23(19), 2507–2517 (2007)

    Article  Google Scholar 

  33. T.J. Umpai, S. Aitken: Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes, BMC Bioinformatics 6(1), 148 (2005)

    Article  Google Scholar 

  34. L. Li, D.M. Umbach, P. Terry, J.A. Taylor: Application of the GA/KNN method to SELDI proteomics data, Bioinformatics 20(10), 1638–1640 (2004)

    Article  Google Scholar 

  35. I.-S. Oh, J.-S. Lee, B.-R. Moon: Hybrid genetic algorithms for feature selection, IEEE Trans. Pattern Anal. Mach. Intell. 26(11), 1424–1437 (2004)

    Article  Google Scholar 

  36. P. Xuan, M.Z. Guo, J. Wang, C.Y. Wang, X.Y. Liu, Y. Liu: Genetic algorithm-based efficient feature selection for classification of pre-miRNAs, Genet. Mol. Res. 10(2), 588–603 (2011)

    Article  Google Scholar 

  37. S. Peng: Molecular classification of cancer types from microarray data using the combination of genetic algorithms and support vector machines, FEBS Letters 555(2), 358–362 (2003)

    Article  Google Scholar 

  38. C.-L. Huang, C.-J. Wang: A GA-based feature selection and parameters optimization for support vector machines, Expert Syst. Appl. 31(2), 231–240 (2006)

    Article  Google Scholar 

  39. E.-G. Talbi, L. Jourdan, J. Garca-Nieto, E. Alba: Comparison of population based metaheuristics for feature selection: Application to microarray data classification, IEEE/ACS Int. Conf. Comput. Syst. Appl. (2008) pp. 45–52

    Google Scholar 

  40. J.C.H. Hernandez, B. Duval, J.-K. Hao: A genetic embedded approach for gene selection and classification of microarray data, Proc. 5th Eur. Conf. Evol. Comput. Mach. Learn. Data Min. Bioinform. (EvoBIO'07) (Springer, Berlin, Heidelberg 2007) pp. 90–101

    Google Scholar 

  41. E.B. Huerta, B. Duval, J.-K. Hao: A hybrid GA/SVM approach for gene selection and classification of microarray data, Lect. Notes Comput. Sci. 3907, 34–44 (2006)

    Article  Google Scholar 

  42. D.P. Muni, N.R. Pal, J. Das: Genetic programming for simultaneous feature selection and classifier design, IEEE Trans. Syst. Man Cybern. Part B 36(1), 106–117 (2006)

    Article  Google Scholar 

  43. J. Yu, J. Yu, A.A. Almal, S.M. Dhanasekaran, D. Ghosh, W.P. Worzel, A.M. Chinnaiyan: Feature selection and molecular classification of cancer using genetic programming, Neoplasia 9(4), 292–303 (2007)

    Article  Google Scholar 

  44. J. Liu, H. Iba, M. Ishizuka: Selecting informative genes with parallel genetic algorithms in tissue classification, Genome Inform. Ser. 9, 14–23 (2001)

    Google Scholar 

  45. L. Jourdan, C. Dhaenens, E.-G. Talbi: Linkage disequilibrium study with a parallel adaptive GA, Int. J. Found. Comput. Sci. 16(2), 241–260 (2004)

    Article  MATH  Google Scholar 

  46. D. Broadhurst, R. Goodacre, A. Jones, J.-J. Rowland, D.B. Kelp: Genetic algorithms as a method for variable selection in multiple linear regression and partial least squares regression, with applications to pyrolysis mass spectrometry, Anal. Chim. Acta 348, 71–86 (1997)

    Article  Google Scholar 

  47. A.W. Whitney: A direct method of nonparametric measurement selection, IEEE Trans. Comput. C-20(9), 1100–1103 (1971)

    Article  MathSciNet  MATH  Google Scholar 

  48. M. Pei, E.D. Goodman, W.F. Punch: Feature extraction using genetic algorithms, Proc. 1st Int. Symp. Intell. Data Eng. Learn. (IDEAL), Vol. 98 (1998) pp. 371–384

    Google Scholar 

  49. U.M. Braga-Neto, E.R. Dougherty: Is cross-validation valid for small-sample microarray classification?, Bioinformatics 20(3), 374–380 (2004)

    Article  Google Scholar 

  50. R. Xu, D. Wunsch: Survey of clustering algorithms, IEEE Trans. Neural Netw. 16, 645–678 (2005)

    Article  Google Scholar 

  51. J.B. MacQueen: Some methods for classification and analysis of multivariate observations, Proc. 5th Berkeley Symp. Math. Stat. Probab. (1967) pp. 281–297

    Google Scholar 

  52. E.R. Hruschka, R.J. Campello, A.A. Freitas, A.C. de Carvalho: A survey of evolutionary algorithms for clustering, IEEE Trans. Syst. Man Cybern. Part C 39(2), 133–155 (2009)

    Article  Google Scholar 

  53. R.H. Sheikh, M.M. Raghuwanshi, A.N. Jaiswal: Genetic algorithm based clustering: A survey, 1st Int. Conf. Emerg. Trends Eng. Technol. ICETET '08. (2008) pp. 314–319

    Google Scholar 

  54. J. Handl, J. Knowles: An evolutionary approach to multiobjective clustering, IEEE Trans. Evol. Comput. 11(1), 56–76 (2007)

    Article  Google Scholar 

  55. J. Handl, J. Knowles: Evolutionary multiobjective clustering, Parallel Problem Solving Nat. 3242, 1081–1091 (2004)

    Google Scholar 

  56. P.C. Ma, K.C. Chan, Y. Xin, D.K. Chiu: An evolutionary clustering algorithm for gene expression microarray data analysis, IEEE Trans. Evol. Comput. 10(3), 296–314 (2006)

    Article  Google Scholar 

  57. P. Merz, A. Zell: Clustering gene expression profiles with memetic algorithms, Proc. 7th Int. Conf. Parallel Problem Solving Nat. (PPSN VII) (Springer, London 2002) pp. 811–820

    Google Scholar 

  58. S. Bandyopadhyay, A. Mukhopadhyay, U. Maulik: An improved algorithm for clustering gene expression data, Bioinformatics 23(21), 2859–2865 (2007)

    Article  Google Scholar 

  59. K. Faceli, M. de Souto, D. de Araujo, A. de Carvalho: Multi-objective clustering ensemble for gene expression data analysis, Neurocomputing 72(13–15), 2763–2774 (2009)

    Article  Google Scholar 

  60. E. Hruschka, L. de Castro, R. Campello: Evolutionary algorithms for clustering gene-expression data, 4th IEEE Int. Conf. Data Min. (ICDM '04) (2004) pp. 403–406

    Chapter  Google Scholar 

  61. M.C. Naldi, A. de Carvalho: Clustering using genetic algorithm combining validation criteria, Proc. 15th Eur. Symp. Artif. Neural Netw. (2007) pp. 139–147

    Google Scholar 

  62. H.S. Park, S.H. Yoo, S.B. Cho: Evolutionary fuzzy clustering algorithm with knowledge-based evaluation and applications for gene expression profiling, J. Comput. Theor. Nanosci. 2(4), 524–533 (2005)

    Article  Google Scholar 

  63. H.S. Park, S.B. Cho: Evolutionary fuzzy cluster analysis with bayesian validation of gene expression profiles, J. Intell. Fuzzy Syst. 18(6), 543–559 (2007)

    MathSciNet  MATH  Google Scholar 

  64. D. Hutchison, T. Kanade, J. Kittler, J.M. Kleinberg, F. Mattern, J.C. Mitchell, M. Naor, O. Nierstrasz, C. Pandu Rangan, B. Steffen, M. Sudan, D. Terzopoulos, D. Tygar, M.Y. Vardi, G. Weikum, H. Liu, J. Liu: Clustering protein interaction data through chaotic genetic algorithm. In: Simulated Evolution and Learning, Vol. 4247, ed. by T.-D. Wang, X. Li, S.-H. Chen, X. Wang, H. Abbass, H. Iba, G.-L. Chen, X. Yao (Springer, Berlin, Heidelberg 2006) pp. 858–864

    Chapter  Google Scholar 

  65. J.J. Tapia, E.E. Vallejo, E. Morett: MOCEA: A multi-objective clustering evolutionary algorithm for inferring protein-protein functional interactions, Proc. 11th Annu. Conf. Genet. Evol. Comput. (2009) pp. 1793–1794

    Google Scholar 

  66. I.A. Sarafis, P.W. Trinder, A.M.S. Zalzala: NOCEA: A rule-based evolutionary algorithm for efficient and effective clustering on massive high-dimensional databases (invited paper), Int. J. Appl. Soft Comput. 7(3), 668–710 (2007)

    Article  Google Scholar 

  67. J.J. Tapia, E. Morett, E.E. Vallejo: A clustering genetic algorithm for genomic data mining. In: Foundations of Computational Intelligence (4), Studies in Computational Intelligence, Vol. 204, ed. by A. Abraham, A.E. Hassanien, A.C.P.L. de Ferreira Carvalho (Springer, Berlin, Heidelberg 2009) pp. 249–275

    Google Scholar 

  68. Y. Cheng, G.M. Church: Biclustering of expression data, Proc. 8th Int. Conf. Intell. Syst. Mol. Biol. (ISMB 2000), San Diego (2000) pp. 93–103

    Google Scholar 

  69. F. Divina, J.S. Aguilar-Ruiz: Biclustering of expression data with evolutionary computation, IEEE Trans. Knowl. Data Eng. (2006) p. 18

    Google Scholar 

  70. S. Mitra, H. Banka: Multi-objective evolutionary biclustering of gene expression data, Pattern Recognit. 39(12), 2464–2477 (2006)

    Article  MATH  Google Scholar 

  71. K. Seridi, L. Jourdan, E.-G. Talbi: Multi-objective evolutionary algorithm for biclustering in microarrays data, IEEE Congr. Evol. Comput. (2011) pp. 2593–2599

    Google Scholar 

  72. J. Handl, D.B. Kell, J. Knowles: Multiobjective optimization in bioinformatics and computational biology, IEEE/ACM Trans. Comput. Biol. Bioinform. 4(2), 279–292 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Julie Hamon .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Hamon, J., Jacques, J., Jourdan, L., Dhaenens, C. (2015). Knowledge Discovery in Bioinformatics. In: Kacprzyk, J., Pedrycz, W. (eds) Springer Handbook of Computational Intelligence. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-43505-2_61

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-43505-2_61

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-43504-5

  • Online ISBN: 978-3-662-43505-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics