MBMEDA: An Application of Estimation of Distribution Algorithms to the Problem of Finding Biological Motifs

Part of the Lecture Notes in Computer Science book series (LNCS, volume 9107)

Abstract

In this work we examine the problem of finding biological motifs in DNA databases. The problem was solved by applying MBMEDA, which is a evolutionary method based on the Estimation of Distribution Algorithm (EDA). Though it assumes statistical independence between the main variables of the problem, results were quite satisfactory when compared with those obtained by other methods; in some cases even better. Its performance was measured by using two metrics: precision and recall, both taken from the field of information retrieval. The comparison involved searching a motif on two types of DNA datasets: synthetic and real. On a set a five real databases the average values of precision and recall were 0.866 and 0.798, respectively.

Keywords

DNA dataset Estimation of distribution algorithms Molecular biology Transcription factor Motifs 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Stormo, G.: DNA binding sites: representation and discovery. Bioinformatics 16(1), 16–23 (2000)CrossRefGoogle Scholar
  2. 2.
    Liu, X.: Bioprospector: Discovering Conserved DNa Motifs in Upstream Regulatory Regions of Co-expressed Genes. In: Pacific Symposium on Biocomputing, vol. 6, pp. 127–138 (2001)Google Scholar
  3. 3.
    Hertz, Z., Stormo, G.: Identifying DNA and Protein Patterns with Statistically Significant Aligments of Multiple Sequences. Bioinformatics 15(7), 563–577 (1999)CrossRefGoogle Scholar
  4. 4.
    Eiben, E. , Smith, J. : What Is an Evolutionary Algorithm. Introduction to Evolutionary Computing. Springer, New York (2003)Google Scholar
  5. 5.
    Endika, B., Larrañaga, P., Bloch, I., Perchant, A.: Estimation of Distribution Algorithms: a New Evolutionary Computation Approach for Graph Matching Problems. Energy Minimization Methods in Computer Vision and Pattern Recognition, 454–469 (2001)Google Scholar
  6. 6.
    Gang, L., Chan, T., Leung, K., Hong, K.: An Estimation of Distribution Algorithm for Motif Discovery. Evolutionary Computation, 2411–2418 (2008)Google Scholar
  7. 7.
    Wei, Z.: GAME: Detecting Cis-regulatory Elements Using a Genetic Algorithm. Bioinformatics 22(13), 1577–1584 (2006)CrossRefGoogle Scholar
  8. 8.
    Sinha, S.: On counting position weight matrix matches in a sequence, with application to discriminative motif finding. Bioinformatics 22(14), 454–463 (2006)CrossRefGoogle Scholar
  9. 9.
    Schneider, T., Stormo, G., Gold, L., Ehrenfeucht, A.: Information Content of Binding Sites on Nucleotide Sequences. Journal of Molecular Biology 188(3), 415–431 (1986)CrossRefGoogle Scholar
  10. 10.
    Shannon, C.: A Mathematical Theory of Communication. Bell Syst., Techn. J. 27, 379–423 (1948)CrossRefMATHMathSciNetGoogle Scholar
  11. 11.
    Jordán, I., Jordán, C.: Aplicación de Algoritmos Evolutivos a la búsqueda de motivos biológicos en bases de regiones promotoras de ADN. Revista Matemática ICM, 33–42 (2012)Google Scholar
  12. 12.
    Fogel, D.: Evolutionary Computation: Toward a new Philosophy in Machine Intelligence. IEEE Press (1995)Google Scholar
  13. 13.
    Manning, D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval, pp. 151–158. Cambridge UP, New York (2008)CrossRefMATHGoogle Scholar
  14. 14.
    Schneider, T., Stephens, R.: Sequence Logos: A New Way to Display Consensus Sequences. Nucleic Acids Res. 18(20), 6097–6100 (1990)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Facultad de Ingeniería en Electricidad y ComputaciónEscuela Superior Politécnica del Litoral (ESPOL)GuayaquilEcuador

Personalised recommendations