Estimation of Distribution Algorithms in Gene Expression Data Analysis

Chapter
Part of the Intelligent Systems Reference Library book series (ISRL, volume 25)

Abstract

Estimation of Distribution Algorithm (EDA) is a relatively new optimization method in the field of evolutionary algorithm. EDAs use probabilistic models to learn properties of the problem to solve from promising solutions and use them to guide the search process. These models can also reveal some unknown regularity patterns in search space. These algorithms have been used for solving some challenging NP-hard bioinformatics problems and demonstrated competitive accuracy. In this chapter, we first provides an overview of different existing EDAs and then review some of their application in bioinformatics and finally we discuss a specific problem that have been solved with this method in more details.

Keywords

Bayesian Network Distribution Algorithm Feature Subset Selection Gene Expression Data Analysis Bayesian Optimization Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Cohen, j.: Bioinformatics—an Introduction for Computer Scientists. ACM Computing SurveyGoogle Scholar
  2. 2.
    Handi, J., Kell Douglas, B., Knowles, J.: Multiobjective Optimization in Bioinformatics and Computational Biology. IEEE/ACM Transaction on Computational Biology and Bioinformatics 4(2), 279–292 (2007)CrossRefGoogle Scholar
  3. 3.
    Pelikan, M., Goldberg, D.E., Lobo, F.G.: A survey of Optimization by Building and Using Probabilistic Models. University of Illinois Genetic AlgorithmsLaboratory, Urbana, IL. IlliGAL Report No. 99018 (1999)Google Scholar
  4. 4.
    Mühlenbein, H., Paaß, G.: From Recombination of Genes to the Estimation of Distributions I. Binary parameters. In: Ebeling, W., Rechenberg, I., Voigt, H.-M., Schwefel, H.-P. (eds.) PPSN 1996. LNCS, vol. 1141, pp. 178–187. Springer, Heidelberg (1996)CrossRefGoogle Scholar
  5. 5.
    Baluja, S.: Population Based Incremental learning: A method for integrating genetic search based function optimization and competitive learning. Carnegie Mellon University, Pittsburgh, PA. Technical Report No. CMUCS94163 (1994)Google Scholar
  6. 6.
    Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, Reading (1989)MATHGoogle Scholar
  7. 7.
    Larrañaga, P., Lozano, J.A. (eds.): Estimation of Distribution Algorithms. A New Tool for Evolutionary Computation. Kluwer Academic Publishers, Dordrecht (2002)MATHGoogle Scholar
  8. 8.
    Lozano, J.A., Larrañaga, P., Inza, I., Bengoetxea, E.: Towards a New Evolutionary Computation: Advances on Estimation of Distribution Algorithms. Springer, Heidelberg (2006)MATHGoogle Scholar
  9. 9.
    Holland, J.H.: Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. University of Michigan Press, Ann Arbor (1975)Google Scholar
  10. 10.
    Goldberg, D.E.: Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, Reading (1989)MATHGoogle Scholar
  11. 11.
    Santana, R., Larranaga, P., Lozano, J.A.: Adaptive Estimation of Distribution Algorithms. In: Cotta, C., Sevaux, M., Sorensen, K. (eds.) Adaptive and Multilevel Metaheuristics. Studies in Computational Intelligence, vol. 136, pp. 177–197. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  12. 12.
    Baluja, S., Caruana, R.: Removing the Genetics from Standard Genetics Algorithm. In: Prieditis, A., Russell, S. (eds.) Proceedings of the International Conference on Machine Learning, vol. 46, pp. 38–46. Morgan Kaufmann, San Francisco (1995)Google Scholar
  13. 13.
    Mühlenbein, H.: The Equation for Response to Selection and its Use for Prediction. Evolutionary Computation 5(3), 303–346 (1998)CrossRefGoogle Scholar
  14. 14.
    Harik, G.R., Lobo, F.G., Goldberg, D.E.: The Compact Genetic Algorithm. In: Proceedings of the IEEE Conference on Evolutionary Computation, pp. 523–528 (1998)Google Scholar
  15. 15.
    Kvasnicka, V., Pelikan, M., Pospichal, J.: Hill Climbing with Learning (An Abstraction of Genetic Algorithm). Neural Network World 6, 773–796 (1996)Google Scholar
  16. 16.
    Pelikan, M., Muhlenbein, H.: The Bivariate Marginal Distribution Algorithm. In: Advances in Soft Computing – Engineering Design and Manufacturing, pp. 521–535 (1999)Google Scholar
  17. 17.
    De Bonet, J.S., Isbell, C.L., Viola, P.: MIMIC: Finding Optima by Estimating Probability Densities. In: Advances in Neural Information Processing Systems (NIPS-1997), vol. 9, pp. 424–431 (1997)Google Scholar
  18. 18.
    Kullback, S., Leibler, R.A.: On Information and sufficiency. Annals of Math. Stats. 22, 79–86 (1951)MATHMathSciNetCrossRefGoogle Scholar
  19. 19.
    Baluja, S., Davies, S.: Using Optimal Dependency-trees for Combinatorial Optimization: Learning the structure of the search space. In: Proceedings of the International Conference on Machine Learning, pp. 30–38 (1997)Google Scholar
  20. 20.
    Santana, R., Ponce de Leon, E., Ochoa, A.: The Edge Incident Model. In: Proceedings of the Second Symposium on Artificial Intelligence (CIMAF-1999), pp. 352–359 (1999)Google Scholar
  21. 21.
    Marascuilo, L.A., McSweeney, M.: Nonparametric and Distribution Free Methods for the Social Sciences. Brooks/Cole Publishing Company, CA (1977)Google Scholar
  22. 22.
    Muhlenbein, H., Mahnig, T., Rodriguez, A.O.: Schemata, Distributions and Graphical Models in Evolutionary Optimization. Journal of Heuristics 5, 215–247 (1999)CrossRefGoogle Scholar
  23. 23.
    Harik, G.: Linkage Learning Via Probabilistic Modeling in the ECGA. IlliGAL Report No. 99010, University of Illinois at Urbana-Champaign, Illinois Genetic Algorithms Laboratory, Urbana, IL (1999)Google Scholar
  24. 24.
    Pelikan, M., Goldberg, D.E., Cant´u-Paz, E.: Linkage Problem, Distribution Estimation, and Bayesian Networks. IlliGAL Report No. 98013. University of Illinois at Urbana-Champaign, Illinois Genetic Algorithms Laboratory, Urbana, IL (1998)Google Scholar
  25. 25.
    Etxeberria, R., Larrañaga, P.: Global Optimization Using Bayesian Networks. In: Rodriguez, A.A.O., Ortiz, M.R.S., Hermida, R.S. (eds.) Second Symposium on Artificial Intelligence (CIMAF-1999), pp. 332–339. Institute of Cybernetics, Mathematics, and Physics and Ministry of Science, Technology and Environment, Habana, Cuba (1999)Google Scholar
  26. 26.
    Rissanen, J.: Modelling by Shortest Data Description. Automatica 14, 465–471 (1978)MATHCrossRefGoogle Scholar
  27. 27.
    Pelikan, M., Goldberg, D.E., Cant´u-Paz, E.: Linkage Problem, Distribution Estimation, and Bayesian Networks. IlliGAL Report No. 98013. University of Illinois at Urbana-Champaign, Illinois Genetic Algorithms Laboratory, Urbana, IL (1998)Google Scholar
  28. 28.
    Etxeberria, R., Larrañaga, P.: Global Optimization Using Bayesian Networks. In: Rodriguez, A.A.O., Ortiz, M.R.S., Hermida, R.S. (eds.) Second Symposium on Artificial Intelligence (CIMAF-1999), pp. 332–339. Institute of Cybernetics, Mathematics, and Physics and Ministry of Science, Technology and Environment, Habana, Cuba (1999)Google Scholar
  29. 29.
    Larranaga, P., Lozano, J.A.: Estimation of Distribution Algorithms. Kluwer Academic Publishers, Dordrecht (2002)MATHGoogle Scholar
  30. 30.
    Pelikan, M.: Bayesian optimization algorithm: from single level to hierarchy, Ph.D. Thesis. University of Illinois (2002) Google Scholar
  31. 31.
    Echegoyen, C., Santana, R., Lozano, J.A., Larrañaga, P.: The Impact of Exact Probabilistic Learning Algorithms in EDAs Based on Bayesian Networks. Linkage in Evolutionary Computation, 109–139 (2008)Google Scholar
  32. 32.
    Eaton, D., Murphy, K.: Exact Bayesian Structure Learning from Uncertain Interventions. In: Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics (2007)Google Scholar
  33. 33.
    Koivisto, M., Sood, K.: Exact Bayesian Structure Discovery in Bayesian networks. Journal of Machine Learning Research 5, 549–573 (2004)MATHMathSciNetGoogle Scholar
  34. 34.
    Silander, T., Myllymaki, P.: A Simple Approach for Finding the Globally Optimal Bayesian Network Structure. In: Proceedings of the 22nd Annual Conference on Uncertainty in Artificial Intelligence (UAI-2006), Morgan Kaufmann Publishers, San Francisco (2006)Google Scholar
  35. 35.
    Muhlenbein, H., Mahnig, T.: FDA – A Scalable Evolutionary Algorithm for the Optimization of Additively Decomposed Functions. Evolutionary Computation 7(4), 353–376 (1999)CrossRefGoogle Scholar
  36. 36.
    Pal, S.K., Bandyopadhyay, S., Ray, S.: Evolutionary Computation in Bioinformatics: A Review. IEEE Transactions on Systems, Man and Cybernetics, Part C 36(2), 601–615 (2006)CrossRefGoogle Scholar
  37. 37.
    Saeys, Y., Inza, I., Larrañaga, P.: A Review of Feature Selection Techniques in Bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)CrossRefGoogle Scholar
  38. 38.
    Guyon, I., Elisseeff, A.: An Introduction to Variable and Feature Selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)MATHGoogle Scholar
  39. 39.
    Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers, Norwell (1998)MATHCrossRefGoogle Scholar
  40. 40.
    Inza, I., Larrañaga, P., Etxebarria, R., Sierra, B.: Feature Subset Selection by Bayesian Networks Based Optimization. Artificial Intelligence 27, 143–164 (1999)Google Scholar
  41. 41.
    Liu, H., et al.: A comparative Study on Feature Selection and Classification Methods Using Gene Expression Profiles and Proteomic patterns. Genome Inform. 13, 51–60 (2002)Google Scholar
  42. 42.
    Larrañaga, P., Lozano, j.a.: Estimation of Distribution Algorithms. A New Tool for Evolutionary Computation. Kluwer Academic Publishers, Dordrecht (2002)MATHGoogle Scholar
  43. 43.
    Butz, M., Pelikan, M., Llora, X., Goldberg, D.E.: Effective and Reliable Online Classification Combining XCS with EDA Mechanisms. In: Pelikan, Sastry, Cantu-Paz (eds.) Scalable Optimization via Probabilistic Modeling: From Algorithms to Applications, pp. 227–249. Springer, Heidelberg (2006)Google Scholar
  44. 44.
    Inza, I., Merino, M., Larrañnaga, P., Quiroga, J., Sierra, B., Girala, M.: Feature Subset Selection by Genetic Algorithms and Estimation of Distribution Algorithms – A Case Study in the Survival of Cirrhotic Patients Treated with TIPS. Artificial Intelligence in Medicine 23(2), 187–205 (2001)CrossRefGoogle Scholar
  45. 45.
    Rossle, M., Richter, M., Nolde, G., Palmaz, J.C., Wenz, W., Gerok, W.: New Non-perative Treatment for Variceal Haemorrhae. Lancet 2, 153 (1989)CrossRefGoogle Scholar
  46. 46.
    Majoros, W.: Methods for Computational Gene Prediction. Cambridge University Press, Cambridge (2007)Google Scholar
  47. 47.
    Saeys, Y.: Feature Selection for Classification of Nucleic Acid Sequences. PhD thesis Ghent University, Belgium (2004)Google Scholar
  48. 48.
    Saeys, Y., Degroeve, S., Aeyels, D., Rouzé, P., van de Peer, Y.: Feature Selection for Splice Site Prediction: A New Method Using EDA-based Feature Ranking. BMC Bioinformatics 5, 64 (2004)CrossRefGoogle Scholar
  49. 49.
    Draghici, S.: Data Analysis Tools for DNA Microarrays. Chapman and Hall/CRC Press (2005)Google Scholar
  50. 50.
    Blanco, R., Larranaga, P., Inza, I., Sierra, B.: Gene Selection for Cancer Classification Using Wrapper Approaches. International Journal of Pattern Recognition and Artificial Intelligence 18(8), 1373–1390 (2004)CrossRefGoogle Scholar
  51. 51.
    Paul, T.K., Iba, H.: Identification of Informative Genes for Molecular Classification Using Probabilistic Model Building Genetic Algorithm. In: Deb, K., et al. (eds.) GECCO 2004. LNCS, vol. 3102, pp. 414–425. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  52. 52.
    Paul, T., Iba, H.: Gene Selection for Classification of Cancers using Probabilistic Model Building Genetic Algorithm. BioSystems 82(3), 208–225 (2005)CrossRefGoogle Scholar
  53. 53.
    Bielza, C., Robles, V., Larranaga, P.: Estimation of Distribution Algorithms as Logistic Regression Regularizers of Microarray Classifiers. Methods Inf. Med. 48(3), 236–241 (2008)CrossRefGoogle Scholar
  54. 54.
    Cestnik, B.: Estimating Probabilities: A crucial Task in Machine Learning. In: Proceedings of the European Conference on Artificial Intelligence, pp. 147–149 (1990)Google Scholar
  55. 55.
    Golub, G.R., et al.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286(15), 531–537 (1999)CrossRefGoogle Scholar
  56. 56.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, Heidelberg (2000)Google Scholar
  57. 57.
    Pena, J., Lozano, J., Larranaga, P.: Unsupervised Learning of Bayesian Networks via Estimation of Distribution Algorithms: An Application to Gene Expression Data Clustering. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 12, 63–82 (2004)MATHMathSciNetCrossRefGoogle Scholar
  58. 58.
    Cano, C., Blanco, A., Garcia, F., Lopez, F.J.: Evolutionary Algorithms for Finding Interpretable Patterns in Gene Expression Data. International Journal on Computer Science and Information System 1(2), 88–99 (2006)Google Scholar
  59. 59.
    Morgan, J., Sonquistz, J.: Problems in the Analysis of Survey Data, and a Proposal. Journal of the American Statistical Association 58, 415–434 (1963)MATHCrossRefGoogle Scholar
  60. 60.
    Cheng, Y., Church, G.M.: Biclustering of Expression Eata. In: Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, pp. 93–103. AAAI Press, Menlo Park (2000)Google Scholar
  61. 61.
    Palacios, P., Pelta, D.A., Blanco, A.: Obtaining Biclusters in Microarrays with Population Based Heuristics. In: Evo. Workshops, pp. 115–126. Springer, Heidelberg (2006)Google Scholar
  62. 62.
    Hruschka, E.R., Campello, R.J.G.B., Freitas, A.A., de Carvalho, A.C.P.L.F.: A Survey of Evolutionary Algorithms for Clustering. IEEE Transactions on Systems, Man and Cybernetics - Part C: Applications and Reviews 39(2), 133–155 (2009)CrossRefGoogle Scholar
  63. 63.
    Hastie, T., et al.: Gene Shaving as a Method for Identifying Distinct Set of Genes With Similar Expression Patterns. Genome Biology 1(2), 1–21 (2000)CrossRefGoogle Scholar
  64. 64.
    Boyle, E.I., et al.: GO::TermFinder – Open Source Software for Accessing Gene Ontology Information and Finding Significantly Enriched Gene Ontology Terms Associated with a List of Genes. Bioinformatics 20, 973–980 (2004)Google Scholar
  65. 65.
    Hart, W., Krasnogor, N., Smith, J. (eds.): Recent Advances in Memetic Algorithms. Studies in Fuzziness and Soft Computing. Physica-Verlag, Heidelberg (2004)Google Scholar
  66. 66.
    González, S., Robles, V., Peña, J.M., Cubo, O.: EDA-Based Logistic Regression Applied to Biomarkers Selection in Breast Cancer. In: En, X. International Work-Conference on Artificial Neural Networks, Salamanca, Spain (2009)Google Scholar
  67. 67.
    Shen, L., Tan, E.C.: Dimension Reduction-based Penalized Logistic Regression for Cancer Classification Using Microarray Data. IEEE/ACM Trans. Comput. Biol. Bioinformatics 2(2), 166–175 (2005)MathSciNetCrossRefGoogle Scholar
  68. 68.
    Armananzas, R., Inza, I., Larranaga, P.: Detecting Reliable Gene Interactions by a Hierarchy of Bayesian Network Classifiers. Comput. Methods Programs Biomed. 91(2), 110–121 (2008)CrossRefGoogle Scholar
  69. 69.
    Dai, C., Liu, J.: Inducing Pairwise Gene Interactions from Time Series Data by EDA Based Bayesian Network. In: Conf. Proc. IEEE Eng. Med. Biol. Soc, vol. 7, pp. 7746–7749 (2005)Google Scholar
  70. 70.
    Fei, L., Juan, L.: In: The 2nd International Conference on Bionformatics and Biomedical Engineering, ICBBE 2008, pp. 1912–1915 (2008)Google Scholar
  71. 71.
    Cano, C., Garcia, F., Lopez, J., Blanco, A.: Intelligent System for the Analysis of Microarray Data using Principal Components and Estimation of Distribution Algorithms. Expert Systems with Applications 42(2) (2008)Google Scholar
  72. 72.
    Soltan Ghoraie, L., Gras, R., Wang, L., Ngom, A.: Bayesian Optimization Algorithm for the Non-unique Oligonucleotide Probe Selection Problem. In: Kadirkamanathan, V., Sanguinetti, G., Girolami, M., Niranjan, M., Noirel, J. (eds.) PRIB 2009. LNCS, vol. 5780, pp. 365–376. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  73. 73.
    Santana, R., Mendiburu, A., Zaitlen, N., Eskin, E., Lozano, J.A.: Multi-marker Tagging Single Nucleotide Polymorphism Selection Using Estimation of Distribution Algorithms. Artificial Intelligence in Medicine (2010) (article in Press)Google Scholar
  74. 74.
    Deb, K., Pratap, A.: A Fast and Elitist Multiobjective Genetic Algorithm: NSGA- II. IEEE Transactions on Evolutionary computation 6(2), 182–197 (2002)CrossRefGoogle Scholar
  75. 75.
    Mitra, S., Banka, H.: Multi-objective Evolutionary Biclustering of Gene Expression Data. Pattern Recognition, 2464–2477 (2006)Google Scholar
  76. 76.
    Chen, B., Hong, J., Wang, Y.: The Minimum Feature Subset Selection Problem. Journal of Computer Science and Technology 12(2), 145–153 (1997)MathSciNetCrossRefGoogle Scholar
  77. 77.
    Soltan Ghoraie, L., Gras, R., Wang, L., Ngom, A.: Optimal Decoding and Minimal Length for the Non-unique Oligonucleotide Probe Selection Problem. Neurocomputing 15(13-15), 2407–2418 (2010)CrossRefGoogle Scholar
  78. 78.
    Klau, G.W., Rahmann, S., Schliep, A., Vingron, M., Reinert, K.: Integer linear programming approaches for non-unique probe selection. Discrete Applied Mathematics 155, 840–856 (2007)MATHMathSciNetCrossRefGoogle Scholar
  79. 79.
    Klau, G.W., Rahmann, S., Schliep, A., Vingron, M., Reinert, K.: Optimal Robust Non-unique Probe Selection Using Integer Linear Programming. Bioinformatics 20, i186–i193 (2004)Google Scholar
  80. 80.
    Wang, L., Ngom, A.: A Model-based Approach to the Non-unique Oligonucleotide Probe Selection Problem. In: Second International Conference on Bio-Inspired Models of Net work, Information, and Computing Systems (Bionetics 2007), Budapest, Hungary, December 10–13 (2007) ISBN: 978-963-9799-05-9Google Scholar
  81. 81.
    Schliep, A., Torney, D.C., Rahmann, S.: Group Testing with DNA Chips: Generating Designs and Decoding Experiments. In: IEEE Computer Society Bioinformatics Conference (CSB 2003), pp. 84–91 (2003)Google Scholar
  82. 82.
    Bosman, P.A., Thierens, D.: Mixed IDEAs. Utrecht UniversityTechnical Report UU-CS-2000-45. Utrecht University, Utrecht, Netherlands (2000b)Google Scholar
  83. 83.
    Larrañaga, P., Etxeberria, R., Lozano, J.A., Pena, J.M.: Optimization in Continuous Domains by Learning and Simulation of Gaussian Networks. In: Workshop Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2000), pp. 201–204 (2000)Google Scholar
  84. 84.
    Pelikan, M., Sastry, K., Goldberg, D.E.: Evolutionary Algorithms+ Graphical Models = Scalable Black-box Optimization. IlliGAL ReportNo. 2001029, Illinois Genetic Algorithms Laboratory. University of Illinois at Urbana-Champaign, Urbana, IL (2001)Google Scholar
  85. 85.
    Yang, Q., Salehi, E., Gras, R.: Using feature selection approaches to find the dependent features. In: Rutkowski, L., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2010. LNCS, vol. 6113, pp. 487–494. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  86. 86.
    Bentley, P.J., Wakefield, J.P.: Finding Acceptable Solutions in the Pareto-Optimal Range using Multiobjective Genetic Algorithms. In: Chawdhry, P.K., Roy, R., Pant, R.K. (eds.) Soft Computing in Engineering Design and Manufacturing, pp. 231–240. Springer Verlag London Limited, London (1997)Google Scholar
  87. 87.
    Klau, G.W., Rahmann, S., Schliep, A., Vingron, M., Reinert, K.: Integer Linear Programming Approaches for Non-unique Probe selection. Discrete Applied Mathematics 155, 840–856 (2007)MATHMathSciNetCrossRefGoogle Scholar
  88. 88.
    Klau, G.W., Rahmann, S., Schliep, A., Vingron, M., Reinert, K.: Optimal Robust Non-unique Probe Selection Using Integer Linear Programming. Bioinformatics 20, i186–i193 (2004)Google Scholar
  89. 89.
    Ragle, M.A., Smith, J.C., Pardalos, P.M.: An optimal cutting-plane algorithm for solving the non-unique probe selection problem. Annals of Biomedical Engineering 35(11), 2023–2030 (2007)CrossRefGoogle Scholar
  90. 90.
    Wang, L., Ngom, A., Gras, R.: Non-unique oligonucleotide microarray probe selection method based on genetic algorithms. In: 2008 IEEE Congress on Evolutionary Computation, Hong Kong, China, June 1-6, pp. 1004–1010 (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of WindsorWindsorCanada

Personalised recommendations