Skip to main content

Motif Discovery Using Expectation Maximization and Gibbs’ Sampling

  • Protocol
  • First Online:
Computational Biology of Transcription Factor Binding

Part of the book series: Methods in Molecular Biology ((MIMB,volume 674))

Abstract

Expectation maximization and Gibbs’ sampling are two statistical approaches used to identify transcription factor binding sites and the motif that represents them. Both take as input unaligned sequences and search for a statistically significant alignment of putative binding sites. Expectation maximization is deterministic so that starting with the same initial parameters will always converge to the same solution, making it wise to start it multiple times from different initial parameters. Gibbs’ sampling is stochastic so that it may arrive at different solutions from the same initial parameters. In both cases multiple runs are advised because comparisons of the solutions after each run can indicate whether a global, optimum solution is likely to have been achieved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Pribnow, D. (1975) Nucleotide sequence of an RNA polymerase binding site at an early T7 promoter. Proc Natl Acad Sci USA 72, 784–788.

    Article  PubMed  CAS  Google Scholar 

  2. Rosenberg, M., and Court, D. (1979) Regulatory sequences involved in the promotion and termination of RNA transcription. Annu Rev Genet 13, 319–353.

    Article  PubMed  CAS  Google Scholar 

  3. Galas, D.J., Eggert, M., and Waterman, M.S. (1985) Rigorous pattern-recognition methods for DNA sequences. Analysis of promoter sequences from Escherichia coli. J Mol Biol 186, 117–128.

    Article  PubMed  CAS  Google Scholar 

  4. Pavesi, G., Mauri, G., and Pesole, G. (2001) An algorithm for finding signals of unknown length in DNA sequences. Bioinformatics 17(Suppl. 1), S207–S214.

    Article  PubMed  Google Scholar 

  5. Marschall, T., and Rahmann, S. (2009) Efficient exact motif discovery. Bioinformatics 25, i356–i364.

    Article  PubMed  CAS  Google Scholar 

  6. Stormo, G.D. (2000) DNA binding sites: representation and discovery. Bioinformatics 16, 16–23.

    Article  PubMed  CAS  Google Scholar 

  7. Stormo, G.D., Schneider, T.D., Gold, L., and Ehrenfeucht, A. (1982) Use of the ‘Perceptron’ algorithm to distinguish translational initiation sites in E. coli. Nucleic Acids Res 10, 2997–3011.

    Article  PubMed  CAS  Google Scholar 

  8. Staden, R. (1984) Computer methods to locate signals in nucleic acid sequences. Nucleic Acids Res 12, 505–519.

    Article  PubMed  CAS  Google Scholar 

  9. Stormo, G.D., and Hartzell, G.W., 3rd. (1989) Identifying protein-binding sites from unaligned DNA fragments. Proc Natl Acad Sci USA 86, 1183–1187.

    Article  PubMed  CAS  Google Scholar 

  10. Das, M.K., and Dai, H.K. (2007) A survey of DNA motif finding algorithms. BMC Bioinformatics 8(Suppl. 7), S21.

    Article  PubMed  Google Scholar 

  11. GuhaThakurta, D. (2006) Computational identification of transcriptional regulatory elements in DNA sequence. Nucleic Acids Res 34, 3585–3598.

    Article  PubMed  CAS  Google Scholar 

  12. Lawrence, C.E., and Reilly, A.A. (1990) An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins 7, 41–51.

    Article  PubMed  CAS  Google Scholar 

  13. Lawrence, C.E., Altschul, S.F., Boguski, M.S., Liu, J.S., Neuwald, A.F., and Wootton, J.C. (1993) Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262, 208–214.

    Article  PubMed  CAS  Google Scholar 

  14. Dempster, A.P., Laird, N.M., and Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc. Ser B (Methodol) 39, 1–38.

    Google Scholar 

  15. Little, R.J.A., and Rubin, D.B. (2002). Statistical analysis with missing data, 2nd edn. Wiley, New York, NY.

    Google Scholar 

  16. Narlikar, L., Gordân, R., Ohler, U., and Hartemink, A.J. (2006) Informative priors based on transcription factor structural class improve de novo motif discovery. Bioinformatics 22, e384–e392.

    Article  PubMed  CAS  Google Scholar 

  17. Bailey, T.L., and Elkan, C. (1995) The value of prior knowledge in discovering motifs with MEME. Proc Int Conf Intell Syst Mol Biol 3, 21–29.

    PubMed  CAS  Google Scholar 

  18. Bailey, T.L., and Elkan, C. (1994) Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol 2, 28–36.

    PubMed  CAS  Google Scholar 

  19. Bailey, T.L., and Elkan, C.P. (1995) Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Mach Learn 21, 51–80.

    Google Scholar 

  20. Bailey, T.L. (2002) Discovering novel sequence motifs with MEME. Curr Protoc Bioinformatics Chapter 2, Unit 2.4.

  21. Liu, J.S., Neuwald, A.F., and Lawrence, C.E. (1995) Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. J Am Stat Assoc 90, 1156–1170.

    Article  Google Scholar 

  22. Roth, F.P., Hughes, J.D., Estep, P.W., and Church, G.M. (1998) Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nat Biotechnol 16, 939–945.

    Article  PubMed  CAS  Google Scholar 

  23. Liu, X., Brutlag, D.L., and Liu, J.S. (2001) BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. Pac Symp Biocomput 2001, 127–138.

    Google Scholar 

  24. Benos, P.V., Bulyk, M.L., and Stormo, G.D. (2002) Additivity in protein-DNA interactions: how good an approximation is it? Nucleic Acids Res 30, 4442–4451.

    Article  PubMed  CAS  Google Scholar 

  25. Djordjevic, M., Sengupta, A.M., and Shraiman, B.I. (2003) A biophysical approach to transcription factor binding site discovery. Genome Res 13, 2381–2390.

    Article  PubMed  CAS  Google Scholar 

  26. Zhao, Y., Granas, D., and Stormo, G.D. (2009) Inferring binding energies from selected binding sites. PLoS Comp Bio, 5, e1000590.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gary D. Stormo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Stormo, G.D. (2010). Motif Discovery Using Expectation Maximization and Gibbs’ Sampling. In: Ladunga, I. (eds) Computational Biology of Transcription Factor Binding. Methods in Molecular Biology, vol 674. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-60761-854-6_6

Download citation

  • DOI: https://doi.org/10.1007/978-1-60761-854-6_6

  • Published:

  • Publisher Name: Humana Press, Totowa, NJ

  • Print ISBN: 978-1-60761-853-9

  • Online ISBN: 978-1-60761-854-6

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics