Skip to main content

Probabilistic Approaches to Transcription Factor Binding Site Prediction

  • Protocol
  • First Online:
Computational Biology of Transcription Factor Binding

Part of the book series: Methods in Molecular Biology ((MIMB,volume 674))

Abstract

Many different computer programs for the prediction of transcription factor binding sites have been developed over the last decades. These programs differ from each other by pursuing different objectives and by taking into account different sources of information. For methods based on statistical approaches, these programs differ at an elementary level from each other by the statistical models used for individual binding sites and flanking sequences and by the learning principles employed for estimating the model parameters. According to our experience, both the models and the learning principles should be chosen with great care, depending on the specific task at hand, but many existing programs do not allow the user to choose them freely. Hence, we developed Jstacs, an object-oriented Java framework for sequence analysis, which allows the user to combine different statistical models and different learning principles in a modular manner with little effort. In this chapter we explain how Jstacs can be used for the recognition of transcription factor binding sites.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.sun.com/java

  2. 2.

    Note that the parameters \(\boldsymbol\theta\) contain the parameters for each class, e.g., \(\boldsymbol\theta_{\textrm{fg}}\), \(\boldsymbol\theta_{\textrm{bg}}\), and the class probabilities.

References

  1. Lawrence, C.E., Altschul, S.F., Boguski, M.S. et al. (1993) Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262, 208–214.

    Article  PubMed  CAS  Google Scholar 

  2. Bailey, T.L., and Elkan, C. (1994) Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In Proceedings of the 2nd International Conference on Intelligent Systems for Molecular Biology.

    Google Scholar 

  3. Pavesi, G., Mauri, G., and Pesole, G. (2001) An algorithm for finding signals of unknown length in dna sequences. Bioinformatics 17, S207–S214.

    Article  PubMed  Google Scholar 

  4. Barash, Y., Elidan, G., Friedman, N. et al. (2003) Modeling dependencies in protein-DNA binding sites. In Proceedings of the Annual International Conference on Research in Computational Molecular Biology (RECOMB). pp.28–37.

    Google Scholar 

  5. Smith, A. D., Sumazin, P., and Zhang, M. Q. (2005) Identifying tissue-selective transcription factor binding sites in vertebrate promoters. Proc Natl Acad Sci U S A 102, 1560–1565.

    Article  PubMed  CAS  Google Scholar 

  6. Elemento, O., Slonim, N., and Tavazoie, S. (2007) A universal framework for regulatory element discovery across all genomes and data types;. Mol Cell 28, 337–350.

    Article  PubMed  CAS  Google Scholar 

  7. Stormo, G.D., Schneider, T.D., Gold, L.M. et al. (1982) Use of the ‘perceptron’ algorithm to distinguish translational initiation sites. Nucleic Acids Res 10, 2997–3010.

    Article  PubMed  CAS  Google Scholar 

  8. Staden, R. (1984) Computer methods to locate signals in nucleic acid sequences. Nucleic Acids Res 12, 505–519.

    Article  PubMed  CAS  Google Scholar 

  9. Zhao, X., Huang, H., and Speed, T. P. (2004) Finding short dna motifs using permuted markov models. In Proceedings of the 8th Annual International Conference on Computational Molecular Biology pp., 68–75. ACM, San Diego, CA.

    Google Scholar 

  10. Kel, A.E., Güssling, E., Reuter, I. et al. (2003) Match: a tool for searching transcription factor binding sites in dna sequences. Nucleic Acids Res 31, 3576–3579.

    Article  PubMed  CAS  Google Scholar 

  11. Sinha, S., van Nimwegen, E., and Siggia, E.D. (2003) A probabilistic method to detect regulatory modules. Bioinformatics 19, 292–301.

    Article  Google Scholar 

  12. Ben-Gal, I., Shani, A., Gohr, A. et al. (2005) Identification of transcription factor binding sites with variable-order Bayesian networks. Bioinformatics 21, 2657–2666.

    Article  PubMed  CAS  Google Scholar 

  13. Grau, J., Keilwagen, J., Kel, A. et al. (2007) Supervised posteriors for DNA-motif classification. In German Conference on Bioinformtics. pp. 123–134.

    Google Scholar 

  14. Blanchette, M., and Tompa, M. (2002) Discovery of regulatory elements by a computational method for phylogenetic footprinting. Genome Res 12, 739–748.

    Article  PubMed  CAS  Google Scholar 

  15. Zhang, Z., and Gerstein, M. (2003) Of mice and men: phylogenetic footprinting aids the discovery of regulatory elements. J Bio 2, 11.

    Article  Google Scholar 

  16. Boffelli, D., McAuliffe, J., Ovcharenko, D. et al. (2003) Phylogenetic shadowing of primate sequences to find functional regions of the human genome. Science 299, 1391–1394.

    Article  PubMed  CAS  Google Scholar 

  17. Halperin, Y., Linhart, C., Ulitsky, I. et al. (2009) Allegro: analyzing expression and sequence in concert to discover regulatory programs. Nucleic Acids Res 37, 1566–1579.

    Article  PubMed  CAS  Google Scholar 

  18. Ji, H., Jiang, H., Ma, W. et al. (2008) An integrated software system for analyzing chip-chip and chip-seq data. Nat Biotech 26, 1293–1300.

    Article  CAS  Google Scholar 

  19. Roos, T., Wettig, H., Grünwald, P. et al. (2005) On discriminative Bayesian network classifiers and logistic regression. Mach Learn 59, 267–296.

    Google Scholar 

  20. Cerquides, J., and De Mántaras, R. (2005) Robust Bayesian linear classifier ensembles. In Proceedings of the 16th European Conference Machine Learning, Lecture Notes in Computer Science. Citeseer, pp. 70–81.

    Google Scholar 

  21. Schneider, T.D., and Stephens, R.M. (1990) Sequence logos: a new way to display consensus sequences. Nucleic Acids Res 18, 6097–6100.

    Article  PubMed  CAS  Google Scholar 

  22. Zhang, M., and Marr, T. (1993) A weight array method for splicing signal analysis. Comput Appl Biosci 9, 499–509.

    PubMed  CAS  Google Scholar 

  23. Salzberg, S.L. (1997) A method for identifying splice sites and translational start sites in eukaryotic mRNA. Comput Appl Biosci 13, 365–376.

    PubMed  CAS  Google Scholar 

  24. Ng, A., and Jordan, M. (2002) On discriminative vs. generative classifiers: a comparison of logistic regression and naive bayes. In Dietterich, T. S. Becker, and Z. Ghahramani (Eds.) Advance in neural information processing systems volume 14, pp.605–610. MIT Press, Cambridge, MA.

    Google Scholar 

  25. Yakhnenko, O., Silvescu, A., and Honavar, V. (2005) Discriminatively trained Markov model for sequence classification. In ICDM ‘05: Proceedings of the 5th IEEE International Conference on Data Mining. IEEE Computer Society, Washington, DC, pp. 498–505.

    Google Scholar 

  26. R Development Core Team. (2009) R: a language and environment for statistical Computing. R Foundation for Statistical Computing, Vienna. ISBN 3-900051-07-0.

    Google Scholar 

  27. Rissanen, J. (1983) A universal data compression system. IEEE Trans Inform Theory 29, 656–664.

    Article  Google Scholar 

  28. Bejerano, G., and Yona, G. (2001) Variations on probabilistic suffix trees: statistical modeling and prediction of protein families. Bioinformatics 17, 23–43.

    Article  PubMed  CAS  Google Scholar 

  29. Orlov, Y.L., Filippov, V.P., Potapov, V.N. et al. (2002) Construction of stochastic context trees for genetic texts. In Silico Bio 2, 233–247.

    CAS  Google Scholar 

  30. Pearl, J. (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann, San Francisco, CA.

    Google Scholar 

  31. Castelo, R., and Guigo, R. (2004) Splice site identification by idlbns. Bioinformatics 20, i69–i76.

    Article  PubMed  CAS  Google Scholar 

  32. Grau, J., Ben-Gal, I., Posch, S. et al. (2006) VOMBAT: prediction of transcription factor binding sites using variable order Bayesian trees. Nucleic Acids Res 34, W529–W533.

    Article  PubMed  CAS  Google Scholar 

  33. Posch, S., Grau, J., Gohr, A. et al. (2007) Recognition of cis-regulatory elements with VOMBAT. J Bioinfor Comput Bio 5, 561–577.

    Article  CAS  Google Scholar 

  34. Buntine, W.L. (1991) Theory refinement of Bayesian networks. In Uncertainty in artificial intelligence. Morgan Kaufmann, San Francisco, CA, pp. 52–62.

    Google Scholar 

  35. Heckerman, D., Geiger, D., and Chickering, D.M. (1995) Learning Bayesian networks: the combination of knowledge and statistical data. Mach Learn 20, 197–243.

    Google Scholar 

  36. Cortes, C., and Vapnik, V. (1995) Support-vector networks. Mach Learn 20, 273–297.

    Google Scholar 

  37. Schweikert, G., Sonnenburg, S., Philips, P. et al. (2007) Accurate splice site prediction using support vector machines. BMC Bioinformatics 8, S7.

    PubMed  Google Scholar 

  38. Sonnenburg, S., Zien, A., Philips, P. et al. (2008) POIMs: positional oligomer importance matrices – understanding support vector machine - based signal detectors. Bioinformatics 24, 6–14.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Posch, S., Grau, J., Gohr, A., Keilwagen, J., Grosse, I. (2010). Probabilistic Approaches to Transcription Factor Binding Site Prediction. In: Ladunga, I. (eds) Computational Biology of Transcription Factor Binding. Methods in Molecular Biology, vol 674. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-60761-854-6_7

Download citation

  • DOI: https://doi.org/10.1007/978-1-60761-854-6_7

  • Published:

  • Publisher Name: Humana Press, Totowa, NJ

  • Print ISBN: 978-1-60761-853-9

  • Online ISBN: 978-1-60761-854-6

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics