Skip to main content

Multiple Sequence Local Alignment Using Monte Carlo EM Algorithm

  • Conference paper
Bioinformatics Research and Applications (ISBRA 2007)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4463))

Included in the following conference series:

Abstract

The Expectation Maximization (EM) motif-finding algorithm is one of the most popular de novo motif discovery methods. However, the EM algorithm largely depends on its initialization and can be easily trapped in local optima. This paper implements a Monte Carlo version of the EM algorithm that performs multiple sequence local alignment to overcome the drawbacks inherent in conventional EM motif-finding algorithms. The newly implemented algorithm is named as Monte Carlo EM Motif Discovery Algorithm (MCEMDA). MCEMDA starts from an initial model, and then it iteratively performs Monte Carlo simulation and parameter update steps until convergence. MCEMDA is compared with other popular motif-finding algorithms using simulated, prokaryotic and eukaryotic motif sequences. Results show that MCEMDA outperforms other algorithms. MCEMDA successfully discovers a helix-turn-helix motif in protein sequences as well. It provides a general framework for motif-finding algorithm development. A website of this program will be available at http://motif.cmh.edu .

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. MacIsaac, K.D., Fraenkel, E.: Practical Strategies for Discovering Regulatory DNA Sequence Motifs. PLoS Comput. Biol. 2, e36 (2006)

    Google Scholar 

  2. Tompa, M., et al.: Assessing Computational Tools for the Discovery of Transcription Factor Binding Sites. Nature Biotechnology 23, 137–144 (2005)

    Article  Google Scholar 

  3. Lawrence, C.E., Reilly, A.A.: An Expectation Maximization Algorithm for the Identification and Characterization of Common Sites in Unaligned Biopolymer Sequences. Proteins: Structure, Function and Genetics 7, 41–51 (1990)

    Article  Google Scholar 

  4. Dempster, A.P., et al.: Maximum Likelihood from Incomplete Data via the EM Algorithm (with Discussion). J. the Royal Statist. Soc. B 39, 1–38 (1977)

    MATH  MathSciNet  Google Scholar 

  5. Bailey, T.L., Elkan, C.: Unsupervised Learning of Multiple Motifs in Biopolymers Using Expectation Maximization. Machine Learning 21, 51–80 (1995)

    Google Scholar 

  6. Celeux, G., et al.: Stochastic Versions of the EM Algorithm: An Experimental Study in the Mixture Case. J. Statist. Comput. Simul. 55, 287–314 (1996)

    Article  MATH  Google Scholar 

  7. Wei, G.C.G., Tanner, M.A.: A Monte Carlo Implementation of the EM Algorithm and the Poor Man’s Data Augmentation Algorithms. Journal of the American Statistical Association 85, 699–704 (1990)

    Article  Google Scholar 

  8. Delyon, B., et al.: Convergence of a Stochastic Approximation Version of the EM Algorithm. Ann. Statist. 27, 94–128 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  9. Berg, O.G., von Hippel, P.H.: Selection of DNA Binding Sites by Regulatory Proteins: Statistical-mechanical Theory and Application to Operators and Promoters. Journal of Molecular Biology 193, 723–750 (1987)

    Article  Google Scholar 

  10. Bonizzoni, P., Vedova, G.D.: The Complexity of Multiple Sequence Alignment with SP-score That Is a Metric. Theoretical Computer Science 259, 63–79 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  11. Bi, C.-P.: SEAM: A Stochastic EM-type Algorithm for Motif-Finding in Biopolymer Sequences. J. Bioinformatics and Comput. Biol., in press (2007)

    Google Scholar 

  12. Wu, C.F.J.: On the Convergence Properties of the EM Algorithm. The Annals of Statistics 11, 95–103 (1983)

    Article  MATH  MathSciNet  Google Scholar 

  13. Lawrence, C.E., et al.: Detecting Subtle Sequence Signals: A Gibbs Sampling Strategy for Multiple Alignment. Science 262, 208–214 (1993)

    Article  Google Scholar 

  14. Liu, X., et al.: BioProspector: Discovering Conserved DNA Motifs in Upstream Regulatory Regions of Co-expressed Genes. In: Pacific Symposium on Biocomputing, vol. 6, pp. 127–138 (2001)

    Google Scholar 

  15. Schneider, T.D., Stephens, R.M.: Sequence Logos: A New Way to Display Consensus Sequences. Nucleic Acids Research 18, 6097–6100 (1990)

    Article  Google Scholar 

  16. Crooks, G.E., et al.: WebLogo: A Sequence Logo Generator. Genome Research 14, 1188–1190 (2004)

    Article  Google Scholar 

  17. Salgado, H., et al.: RegulonDB (version 5.0): Escherichia coli K-12 Transcriptional Regulatory Network, Operon Organization, and Growth Conditions. Nucleic Acids Res. 34, D394–397 (2006)

    Google Scholar 

  18. Kel, A.E., et al.: Computer-assisted Identification of Cell Cycle-related Genes: New Targets for E2F Transcription Factors. J. Mol. Biol. 309, 99–120 (2001)

    Article  Google Scholar 

  19. Klinge, C.M.: Estrogen Receptor Interaction with Estrogen Response Elements. Nucleic Acids Res. 29, 2905–2919 (2001)

    Article  Google Scholar 

  20. Wei, Z., Jensen, S.T.: GAME: Detecting cis-Regulatory Elements Using a Genetic Algorithm. Bioinformatics 22, 1577–1584 (2006)

    Article  Google Scholar 

  21. Martinez-Bueno, M., et al.: BacTregulators: A Database of Transcriptional Regulators in Bacteria and Archaea. Bioinformatics 20, 2787–2791 (2004)

    Article  Google Scholar 

  22. Krell, T., et al.: The IclR Family of Transcriptional Activators and Repressors Can Be Defined by a Single Profile. Protein Science 15, 1207–1213 (2006)

    Article  Google Scholar 

  23. Bi, C.-P.: A Genetic-Based EM Motif-Finding Algorithm for Biological Sequence Analysis. In: Proceeding of IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, in press (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Ion Măndoiu Alexander Zelikovsky

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bi, C. (2007). Multiple Sequence Local Alignment Using Monte Carlo EM Algorithm. In: Măndoiu, I., Zelikovsky, A. (eds) Bioinformatics Research and Applications. ISBRA 2007. Lecture Notes in Computer Science(), vol 4463. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72031-7_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-72031-7_42

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-72030-0

  • Online ISBN: 978-3-540-72031-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics