Probabilistic Approaches to Transcription Factor Binding Site Prediction

Posch, Stefan; Grau, Jan; Gohr, André; Keilwagen, Jens; Grosse, Ivo

doi:10.1007/978-1-60761-854-6_7

Stefan Posch²,
Jan Grau²,
André Gohr³,
Jens Keilwagen⁴ &
…
Ivo Grosse²

Part of the book series: Methods in Molecular Biology ((MIMB,volume 674))

3847 Accesses
1 Citations

Abstract

Many different computer programs for the prediction of transcription factor binding sites have been developed over the last decades. These programs differ from each other by pursuing different objectives and by taking into account different sources of information. For methods based on statistical approaches, these programs differ at an elementary level from each other by the statistical models used for individual binding sites and flanking sequences and by the learning principles employed for estimating the model parameters. According to our experience, both the models and the learning principles should be chosen with great care, depending on the specific task at hand, but many existing programs do not allow the user to choose them freely. Hence, we developed Jstacs, an object-oriented Java framework for sequence analysis, which allows the user to combine different statistical models and different learning principles in a modular manner with little effort. In this chapter we explain how Jstacs can be used for the recognition of transcription factor binding sites.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.sun.com/java
2.
Note that the parameters \(\boldsymbol\theta\) contain the parameters for each class, e.g., \(\boldsymbol\theta_{\textrm{fg}}\), \(\boldsymbol\theta_{\textrm{bg}}\), and the class probabilities.

References

Lawrence, C.E., Altschul, S.F., Boguski, M.S. et al. (1993) Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262, 208–214.
Article PubMed CAS Google Scholar
Bailey, T.L., and Elkan, C. (1994) Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In Proceedings of the 2nd International Conference on Intelligent Systems for Molecular Biology.
Google Scholar
Pavesi, G., Mauri, G., and Pesole, G. (2001) An algorithm for finding signals of unknown length in dna sequences. Bioinformatics 17, S207–S214.
Article PubMed Google Scholar
Barash, Y., Elidan, G., Friedman, N. et al. (2003) Modeling dependencies in protein-DNA binding sites. In Proceedings of the Annual International Conference on Research in Computational Molecular Biology (RECOMB). pp.28–37.
Google Scholar
Smith, A. D., Sumazin, P., and Zhang, M. Q. (2005) Identifying tissue-selective transcription factor binding sites in vertebrate promoters. Proc Natl Acad Sci U S A 102, 1560–1565.
Article PubMed CAS Google Scholar
Elemento, O., Slonim, N., and Tavazoie, S. (2007) A universal framework for regulatory element discovery across all genomes and data types;. Mol Cell 28, 337–350.
Article PubMed CAS Google Scholar
Stormo, G.D., Schneider, T.D., Gold, L.M. et al. (1982) Use of the ‘perceptron’ algorithm to distinguish translational initiation sites. Nucleic Acids Res 10, 2997–3010.
Article PubMed CAS Google Scholar
Staden, R. (1984) Computer methods to locate signals in nucleic acid sequences. Nucleic Acids Res 12, 505–519.
Article PubMed CAS Google Scholar
Zhao, X., Huang, H., and Speed, T. P. (2004) Finding short dna motifs using permuted markov models. In Proceedings of the 8th Annual International Conference on Computational Molecular Biology pp., 68–75. ACM, San Diego, CA.
Google Scholar
Kel, A.E., Güssling, E., Reuter, I. et al. (2003) Match: a tool for searching transcription factor binding sites in dna sequences. Nucleic Acids Res 31, 3576–3579.
Article PubMed CAS Google Scholar
Sinha, S., van Nimwegen, E., and Siggia, E.D. (2003) A probabilistic method to detect regulatory modules. Bioinformatics 19, 292–301.
Article Google Scholar
Ben-Gal, I., Shani, A., Gohr, A. et al. (2005) Identification of transcription factor binding sites with variable-order Bayesian networks. Bioinformatics 21, 2657–2666.
Article PubMed CAS Google Scholar
Grau, J., Keilwagen, J., Kel, A. et al. (2007) Supervised posteriors for DNA-motif classification. In German Conference on Bioinformtics. pp. 123–134.
Google Scholar
Blanchette, M., and Tompa, M. (2002) Discovery of regulatory elements by a computational method for phylogenetic footprinting. Genome Res 12, 739–748.
Article PubMed CAS Google Scholar
Zhang, Z., and Gerstein, M. (2003) Of mice and men: phylogenetic footprinting aids the discovery of regulatory elements. J Bio 2, 11.
Article Google Scholar
Boffelli, D., McAuliffe, J., Ovcharenko, D. et al. (2003) Phylogenetic shadowing of primate sequences to find functional regions of the human genome. Science 299, 1391–1394.
Article PubMed CAS Google Scholar
Halperin, Y., Linhart, C., Ulitsky, I. et al. (2009) Allegro: analyzing expression and sequence in concert to discover regulatory programs. Nucleic Acids Res 37, 1566–1579.
Article PubMed CAS Google Scholar
Ji, H., Jiang, H., Ma, W. et al. (2008) An integrated software system for analyzing chip-chip and chip-seq data. Nat Biotech 26, 1293–1300.
Article CAS Google Scholar
Roos, T., Wettig, H., Grünwald, P. et al. (2005) On discriminative Bayesian network classifiers and logistic regression. Mach Learn 59, 267–296.
Google Scholar
Cerquides, J., and De Mántaras, R. (2005) Robust Bayesian linear classifier ensembles. In Proceedings of the 16th European Conference Machine Learning, Lecture Notes in Computer Science. Citeseer, pp. 70–81.
Google Scholar
Schneider, T.D., and Stephens, R.M. (1990) Sequence logos: a new way to display consensus sequences. Nucleic Acids Res 18, 6097–6100.
Article PubMed CAS Google Scholar
Zhang, M., and Marr, T. (1993) A weight array method for splicing signal analysis. Comput Appl Biosci 9, 499–509.
PubMed CAS Google Scholar
Salzberg, S.L. (1997) A method for identifying splice sites and translational start sites in eukaryotic mRNA. Comput Appl Biosci 13, 365–376.
PubMed CAS Google Scholar
Ng, A., and Jordan, M. (2002) On discriminative vs. generative classifiers: a comparison of logistic regression and naive bayes. In Dietterich, T. S. Becker, and Z. Ghahramani (Eds.) Advance in neural information processing systems volume 14, pp.605–610. MIT Press, Cambridge, MA.
Google Scholar
Yakhnenko, O., Silvescu, A., and Honavar, V. (2005) Discriminatively trained Markov model for sequence classification. In ICDM ‘05: Proceedings of the 5th IEEE International Conference on Data Mining. IEEE Computer Society, Washington, DC, pp. 498–505.
Google Scholar
R Development Core Team. (2009) R: a language and environment for statistical Computing. R Foundation for Statistical Computing, Vienna. ISBN 3-900051-07-0.
Google Scholar
Rissanen, J. (1983) A universal data compression system. IEEE Trans Inform Theory 29, 656–664.
Article Google Scholar
Bejerano, G., and Yona, G. (2001) Variations on probabilistic suffix trees: statistical modeling and prediction of protein families. Bioinformatics 17, 23–43.
Article PubMed CAS Google Scholar
Orlov, Y.L., Filippov, V.P., Potapov, V.N. et al. (2002) Construction of stochastic context trees for genetic texts. In Silico Bio 2, 233–247.
CAS Google Scholar
Pearl, J. (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann, San Francisco, CA.
Google Scholar
Castelo, R., and Guigo, R. (2004) Splice site identification by idlbns. Bioinformatics 20, i69–i76.
Article PubMed CAS Google Scholar
Grau, J., Ben-Gal, I., Posch, S. et al. (2006) VOMBAT: prediction of transcription factor binding sites using variable order Bayesian trees. Nucleic Acids Res 34, W529–W533.
Article PubMed CAS Google Scholar
Posch, S., Grau, J., Gohr, A. et al. (2007) Recognition of cis-regulatory elements with VOMBAT. J Bioinfor Comput Bio 5, 561–577.
Article CAS Google Scholar
Buntine, W.L. (1991) Theory refinement of Bayesian networks. In Uncertainty in artificial intelligence. Morgan Kaufmann, San Francisco, CA, pp. 52–62.
Google Scholar
Heckerman, D., Geiger, D., and Chickering, D.M. (1995) Learning Bayesian networks: the combination of knowledge and statistical data. Mach Learn 20, 197–243.
Google Scholar
Cortes, C., and Vapnik, V. (1995) Support-vector networks. Mach Learn 20, 273–297.
Google Scholar
Schweikert, G., Sonnenburg, S., Philips, P. et al. (2007) Accurate splice site prediction using support vector machines. BMC Bioinformatics 8, S7.
PubMed Google Scholar
Sonnenburg, S., Zien, A., Philips, P. et al. (2008) POIMs: positional oligomer importance matrices – understanding support vector machine - based signal detectors. Bioinformatics 24, 6–14.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computer Science, Martin Luther University, Halle–Wittenberg, Germany
Stefan Posch, Jan Grau & Ivo Grosse
Leibniz Institute of Plant Biochemistry (IPB), Halle, Germany
André Gohr
Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Germany
Jens Keilwagen

Authors

Stefan Posch
View author publications
You can also search for this author in PubMed Google Scholar
Jan Grau
View author publications
You can also search for this author in PubMed Google Scholar
André Gohr
View author publications
You can also search for this author in PubMed Google Scholar
Jens Keilwagen
View author publications
You can also search for this author in PubMed Google Scholar
Ivo Grosse
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

, Department of Statistics, University of Nebraska-Lincoln, Vine Street 1901, Lincoln, 68588-0665, Nebraska, USA
Istvan Ladunga

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Posch, S., Grau, J., Gohr, A., Keilwagen, J., Grosse, I. (2010). Probabilistic Approaches to Transcription Factor Binding Site Prediction. In: Ladunga, I. (eds) Computational Biology of Transcription Factor Binding. Methods in Molecular Biology, vol 674. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-60761-854-6_7

Download citation

DOI: https://doi.org/10.1007/978-1-60761-854-6_7
Published: 23 August 2010
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-60761-853-9
Online ISBN: 978-1-60761-854-6
eBook Packages: Springer Protocols

Publish with us

Policies and ethics