Motif Discovery Using Expectation Maximization and Gibbs’ Sampling

Stormo, Gary D.

doi:10.1007/978-1-60761-854-6_6

Gary D. Stormo²

Part of the book series: Methods in Molecular Biology ((MIMB,volume 674))

4091 Accesses
5 Citations

Abstract

Expectation maximization and Gibbs’ sampling are two statistical approaches used to identify transcription factor binding sites and the motif that represents them. Both take as input unaligned sequences and search for a statistically significant alignment of putative binding sites. Expectation maximization is deterministic so that starting with the same initial parameters will always converge to the same solution, making it wise to start it multiple times from different initial parameters. Gibbs’ sampling is stochastic so that it may arrive at different solutions from the same initial parameters. In both cases multiple runs are advised because comparisons of the solutions after each run can indicate whether a global, optimum solution is likely to have been achieved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Pribnow, D. (1975) Nucleotide sequence of an RNA polymerase binding site at an early T7 promoter. Proc Natl Acad Sci USA 72, 784–788.
Article PubMed CAS Google Scholar
Rosenberg, M., and Court, D. (1979) Regulatory sequences involved in the promotion and termination of RNA transcription. Annu Rev Genet 13, 319–353.
Article PubMed CAS Google Scholar
Galas, D.J., Eggert, M., and Waterman, M.S. (1985) Rigorous pattern-recognition methods for DNA sequences. Analysis of promoter sequences from Escherichia coli. J Mol Biol 186, 117–128.
Article PubMed CAS Google Scholar
Pavesi, G., Mauri, G., and Pesole, G. (2001) An algorithm for finding signals of unknown length in DNA sequences. Bioinformatics 17(Suppl. 1), S207–S214.
Article PubMed Google Scholar
Marschall, T., and Rahmann, S. (2009) Efficient exact motif discovery. Bioinformatics 25, i356–i364.
Article PubMed CAS Google Scholar
Stormo, G.D. (2000) DNA binding sites: representation and discovery. Bioinformatics 16, 16–23.
Article PubMed CAS Google Scholar
Stormo, G.D., Schneider, T.D., Gold, L., and Ehrenfeucht, A. (1982) Use of the ‘Perceptron’ algorithm to distinguish translational initiation sites in E. coli. Nucleic Acids Res 10, 2997–3011.
Article PubMed CAS Google Scholar
Staden, R. (1984) Computer methods to locate signals in nucleic acid sequences. Nucleic Acids Res 12, 505–519.
Article PubMed CAS Google Scholar
Stormo, G.D., and Hartzell, G.W., 3rd. (1989) Identifying protein-binding sites from unaligned DNA fragments. Proc Natl Acad Sci USA 86, 1183–1187.
Article PubMed CAS Google Scholar
Das, M.K., and Dai, H.K. (2007) A survey of DNA motif finding algorithms. BMC Bioinformatics 8(Suppl. 7), S21.
Article PubMed Google Scholar
GuhaThakurta, D. (2006) Computational identification of transcriptional regulatory elements in DNA sequence. Nucleic Acids Res 34, 3585–3598.
Article PubMed CAS Google Scholar
Lawrence, C.E., and Reilly, A.A. (1990) An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins 7, 41–51.
Article PubMed CAS Google Scholar
Lawrence, C.E., Altschul, S.F., Boguski, M.S., Liu, J.S., Neuwald, A.F., and Wootton, J.C. (1993) Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262, 208–214.
Article PubMed CAS Google Scholar
Dempster, A.P., Laird, N.M., and Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc. Ser B (Methodol) 39, 1–38.
Google Scholar
Little, R.J.A., and Rubin, D.B. (2002). Statistical analysis with missing data, 2nd edn. Wiley, New York, NY.
Google Scholar
Narlikar, L., Gordân, R., Ohler, U., and Hartemink, A.J. (2006) Informative priors based on transcription factor structural class improve de novo motif discovery. Bioinformatics 22, e384–e392.
Article PubMed CAS Google Scholar
Bailey, T.L., and Elkan, C. (1995) The value of prior knowledge in discovering motifs with MEME. Proc Int Conf Intell Syst Mol Biol 3, 21–29.
PubMed CAS Google Scholar
Bailey, T.L., and Elkan, C. (1994) Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol 2, 28–36.
PubMed CAS Google Scholar
Bailey, T.L., and Elkan, C.P. (1995) Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Mach Learn 21, 51–80.
Google Scholar
Bailey, T.L. (2002) Discovering novel sequence motifs with MEME. Curr Protoc Bioinformatics Chapter 2, Unit 2.4.
Liu, J.S., Neuwald, A.F., and Lawrence, C.E. (1995) Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. J Am Stat Assoc 90, 1156–1170.
Article Google Scholar
Roth, F.P., Hughes, J.D., Estep, P.W., and Church, G.M. (1998) Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nat Biotechnol 16, 939–945.
Article PubMed CAS Google Scholar
Liu, X., Brutlag, D.L., and Liu, J.S. (2001) BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. Pac Symp Biocomput 2001, 127–138.
Google Scholar
Benos, P.V., Bulyk, M.L., and Stormo, G.D. (2002) Additivity in protein-DNA interactions: how good an approximation is it? Nucleic Acids Res 30, 4442–4451.
Article PubMed CAS Google Scholar
Djordjevic, M., Sengupta, A.M., and Shraiman, B.I. (2003) A biophysical approach to transcription factor binding site discovery. Genome Res 13, 2381–2390.
Article PubMed CAS Google Scholar
Zhao, Y., Granas, D., and Stormo, G.D. (2009) Inferring binding energies from selected binding sites. PLoS Comp Bio, 5, e1000590.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Genetics, School of Medicine, Washington University, St. Louis, MO, USA
Gary D. Stormo

Authors

Gary D. Stormo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gary D. Stormo .

Editor information

Editors and Affiliations

, Department of Statistics, University of Nebraska-Lincoln, Vine Street 1901, Lincoln, 68588-0665, Nebraska, USA
Istvan Ladunga

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Stormo, G.D. (2010). Motif Discovery Using Expectation Maximization and Gibbs’ Sampling. In: Ladunga, I. (eds) Computational Biology of Transcription Factor Binding. Methods in Molecular Biology, vol 674. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-60761-854-6_6

Download citation

DOI: https://doi.org/10.1007/978-1-60761-854-6_6
Published: 23 August 2010
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-60761-853-9
Online ISBN: 978-1-60761-854-6
eBook Packages: Springer Protocols

Publish with us

Policies and ethics