Skip to main content
Log in

Statistical Mechanics of Transcription-Factor Binding Site Discovery Using Hidden Markov Models

  • Published:
Journal of Statistical Physics Aims and scope Submit manuscript

Abstract

Hidden Markov Models (HMMs) are a commonly used tool for inference of transcription factor (TF) binding sites from DNA sequence data. We exploit the mathematical equivalence between HMMs for TF binding and the “inverse” statistical mechanics of hard rods in a one-dimensional disordered potential to investigate learning in HMMs. We derive analytic expressions for the Fisher information, a commonly employed measure of confidence in learned parameters, in the biologically relevant limit where the density of binding sites is low. We then use techniques from statistical mechanics to derive a scaling principle relating the specificity (binding energy) of a TF to the minimum amount of training data necessary to learn it.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Berg, O.G., von Hippel, P.: Trends Biochem. Sci. 13, 207 (1988)

    Article  Google Scholar 

  2. Stormo, G., Fields, D.: Trends Biochem. Sci. 23, 109 (1998)

    Article  Google Scholar 

  3. Djordjevic, M., Sengupta, A.M., Shraiman, B.I.: Genome Res. 13, 2381 (2003)

    Article  Google Scholar 

  4. Rajewsky, N., Vergassola, M., Gaul, U., Siggia, E.: BMC Bioinform. 3 (2002)

  5. Sinha, S., van Nimwegen, E., Siggia, E.D.: Bioinformatics 19, 292 (2003)

    Article  Google Scholar 

  6. Drawid, A., Gupta, N., Nagaraj, V., Gelinas, C., Sengupta, A.: BMC Bioinform. 10, 208 (2009)

    Article  Google Scholar 

  7. Kinney, J.B., Tkaik, G., Callan, C.G.: Proc. Natl. Acad. Sci. USA 104, 501 (2007)

    Article  ADS  Google Scholar 

  8. Percus, J.: J. Stat. Phys. 15 (1976)

  9. Bishop, C.: In: Pattern Recognition and Machine Learning (2006)

  10. Rabiner, L.: Proc. IEEE 257 (1989)

  11. Schwab, D.J., Bruinsma, R., Rudnick, J., Widom, J.: Phys. Rev. Lett. 100, 228105 (2008)

    Article  ADS  Google Scholar 

  12. Morozov, A., Fortney, K., Gaykalova, D.A., Studitsky, V., Widom, J., Siggia, E.: arXiv:0805.4017 (2008)

  13. Baum, L.E., Petrie, T., Soules, G., Weiss, N.: Ann. Math. Stat. 41, 164 (1970)

    Article  MATH  MathSciNet  Google Scholar 

  14. Olsen, R., Bundschuh, R., Hwa, T.: In: Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology, p. 211 (1999)

    Google Scholar 

  15. Tanay, A., Siggia, E.: Genome Biol. 9, 37 (2008)

    Article  Google Scholar 

  16. Jeffreys, H.: Proc. R. Soc. Lond. Ser. A, Math. Phys. Sci. 186, 453 (1946)

    Article  MATH  ADS  MathSciNet  Google Scholar 

  17. Mahalanobis, P.: Proc. Natl. Inst. Sci. India 2, 49–55 (1936)

    MATH  Google Scholar 

  18. Mora, T., Walczak, A., Bialek, W., Callan, C.G.: Proc. Natl. Acad. Sci. USA 107, 5405 (2010)

    Article  ADS  Google Scholar 

  19. Schneidman, E., Berry, M., Segev, R., Bialek, W.: Nature 440, 1007 (2006)

    Article  ADS  Google Scholar 

  20. Halabi, N., Rivoire, O., Leibler, S., Ranganathan, R.: Cell 138, 774 (2009)

    Article  Google Scholar 

  21. Weigt, M., White, R., Szurmant, H., Hoch, J., Hwa, T.: Proc. Natl. Acad. Sci. USA 106, 67 (2009)

    Article  ADS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pankaj Mehta.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mehta, P., Schwab, D.J. & Sengupta, A.M. Statistical Mechanics of Transcription-Factor Binding Site Discovery Using Hidden Markov Models. J Stat Phys 142, 1187–1205 (2011). https://doi.org/10.1007/s10955-010-0102-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10955-010-0102-x

Keywords

Navigation