Motif Discovery Using Expectation Maximization and Gibbs’ Sampling
Expectation maximization and Gibbs’ sampling are two statistical approaches used to identify transcription factor binding sites and the motif that represents them. Both take as input unaligned sequences and search for a statistically significant alignment of putative binding sites. Expectation maximization is deterministic so that starting with the same initial parameters will always converge to the same solution, making it wise to start it multiple times from different initial parameters. Gibbs’ sampling is stochastic so that it may arrive at different solutions from the same initial parameters. In both cases multiple runs are advised because comparisons of the solutions after each run can indicate whether a global, optimum solution is likely to have been achieved.
Key wordsExpectation maximization Gibbs’ sampling transcription factor binding sites motif discovery position weight matrices position frequency matrices regulatory sites motif modeling
- 14.Dempster, A.P., Laird, N.M., and Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc. Ser B (Methodol) 39, 1–38.Google Scholar
- 15.Little, R.J.A., and Rubin, D.B. (2002). Statistical analysis with missing data, 2nd edn. Wiley, New York, NY.Google Scholar
- 19.Bailey, T.L., and Elkan, C.P. (1995) Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Mach Learn 21, 51–80.Google Scholar
- 20.Bailey, T.L. (2002) Discovering novel sequence motifs with MEME. Curr Protoc Bioinformatics Chapter 2, Unit 2.4.
- 23.Liu, X., Brutlag, D.L., and Liu, J.S. (2001) BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. Pac Symp Biocomput 2001, 127–138.Google Scholar
- 26.Zhao, Y., Granas, D., and Stormo, G.D. (2009) Inferring binding energies from selected binding sites. PLoS Comp Bio, 5, e1000590.Google Scholar