Abstract
The Expectation Maximization (EM) motif-finding algorithm is one of the most popular de novo motif discovery methods. However, the EM algorithm largely depends on its initialization and can be easily trapped in local optima. This paper implements a Monte Carlo version of the EM algorithm that performs multiple sequence local alignment to overcome the drawbacks inherent in conventional EM motif-finding algorithms. The newly implemented algorithm is named as Monte Carlo EM Motif Discovery Algorithm (MCEMDA). MCEMDA starts from an initial model, and then it iteratively performs Monte Carlo simulation and parameter update steps until convergence. MCEMDA is compared with other popular motif-finding algorithms using simulated, prokaryotic and eukaryotic motif sequences. Results show that MCEMDA outperforms other algorithms. MCEMDA successfully discovers a helix-turn-helix motif in protein sequences as well. It provides a general framework for motif-finding algorithm development. A website of this program will be available at http://motif.cmh.edu .
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
MacIsaac, K.D., Fraenkel, E.: Practical Strategies for Discovering Regulatory DNA Sequence Motifs. PLoS Comput. Biol. 2, e36 (2006)
Tompa, M., et al.: Assessing Computational Tools for the Discovery of Transcription Factor Binding Sites. Nature Biotechnology 23, 137–144 (2005)
Lawrence, C.E., Reilly, A.A.: An Expectation Maximization Algorithm for the Identification and Characterization of Common Sites in Unaligned Biopolymer Sequences. Proteins: Structure, Function and Genetics 7, 41–51 (1990)
Dempster, A.P., et al.: Maximum Likelihood from Incomplete Data via the EM Algorithm (with Discussion). J. the Royal Statist. Soc. B 39, 1–38 (1977)
Bailey, T.L., Elkan, C.: Unsupervised Learning of Multiple Motifs in Biopolymers Using Expectation Maximization. Machine Learning 21, 51–80 (1995)
Celeux, G., et al.: Stochastic Versions of the EM Algorithm: An Experimental Study in the Mixture Case. J. Statist. Comput. Simul. 55, 287–314 (1996)
Wei, G.C.G., Tanner, M.A.: A Monte Carlo Implementation of the EM Algorithm and the Poor Man’s Data Augmentation Algorithms. Journal of the American Statistical Association 85, 699–704 (1990)
Delyon, B., et al.: Convergence of a Stochastic Approximation Version of the EM Algorithm. Ann. Statist. 27, 94–128 (1999)
Berg, O.G., von Hippel, P.H.: Selection of DNA Binding Sites by Regulatory Proteins: Statistical-mechanical Theory and Application to Operators and Promoters. Journal of Molecular Biology 193, 723–750 (1987)
Bonizzoni, P., Vedova, G.D.: The Complexity of Multiple Sequence Alignment with SP-score That Is a Metric. Theoretical Computer Science 259, 63–79 (2001)
Bi, C.-P.: SEAM: A Stochastic EM-type Algorithm for Motif-Finding in Biopolymer Sequences. J. Bioinformatics and Comput. Biol., in press (2007)
Wu, C.F.J.: On the Convergence Properties of the EM Algorithm. The Annals of Statistics 11, 95–103 (1983)
Lawrence, C.E., et al.: Detecting Subtle Sequence Signals: A Gibbs Sampling Strategy for Multiple Alignment. Science 262, 208–214 (1993)
Liu, X., et al.: BioProspector: Discovering Conserved DNA Motifs in Upstream Regulatory Regions of Co-expressed Genes. In: Pacific Symposium on Biocomputing, vol. 6, pp. 127–138 (2001)
Schneider, T.D., Stephens, R.M.: Sequence Logos: A New Way to Display Consensus Sequences. Nucleic Acids Research 18, 6097–6100 (1990)
Crooks, G.E., et al.: WebLogo: A Sequence Logo Generator. Genome Research 14, 1188–1190 (2004)
Salgado, H., et al.: RegulonDB (version 5.0): Escherichia coli K-12 Transcriptional Regulatory Network, Operon Organization, and Growth Conditions. Nucleic Acids Res. 34, D394–397 (2006)
Kel, A.E., et al.: Computer-assisted Identification of Cell Cycle-related Genes: New Targets for E2F Transcription Factors. J. Mol. Biol. 309, 99–120 (2001)
Klinge, C.M.: Estrogen Receptor Interaction with Estrogen Response Elements. Nucleic Acids Res. 29, 2905–2919 (2001)
Wei, Z., Jensen, S.T.: GAME: Detecting cis-Regulatory Elements Using a Genetic Algorithm. Bioinformatics 22, 1577–1584 (2006)
Martinez-Bueno, M., et al.: BacTregulators: A Database of Transcriptional Regulators in Bacteria and Archaea. Bioinformatics 20, 2787–2791 (2004)
Krell, T., et al.: The IclR Family of Transcriptional Activators and Repressors Can Be Defined by a Single Profile. Protein Science 15, 1207–1213 (2006)
Bi, C.-P.: A Genetic-Based EM Motif-Finding Algorithm for Biological Sequence Analysis. In: Proceeding of IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, in press (2007)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bi, C. (2007). Multiple Sequence Local Alignment Using Monte Carlo EM Algorithm. In: Măndoiu, I., Zelikovsky, A. (eds) Bioinformatics Research and Applications. ISBRA 2007. Lecture Notes in Computer Science(), vol 4463. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72031-7_42
Download citation
DOI: https://doi.org/10.1007/978-3-540-72031-7_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72030-0
Online ISBN: 978-3-540-72031-7
eBook Packages: Computer ScienceComputer Science (R0)