Skip to main content

Hidden Markov Models for Prediction of Protein Features

  • Protocol
Protein Structure Prediction

Part of the book series: Methods in Molecular Biology™ ((MIMB,volume 413))

Summary

Hidden Markov Models (HMMs) are an extremely versatile statistical representation that can be used to model any set of one-dimensional discrete symbol data. HMMs can model protein sequences in many ways, depending on what features of the protein are represented by the Markov states. For protein structure prediction, states have been chosen to represent either homologous sequence positions, local or secondary structure types, or transmembrane locality. The resulting models can be used to predict common ancestry, secondary or local structure, or membrane topology by applying one of the two standard algorithms for comparing a sequence to a model. In this chapter, we review those algorithms and discuss how HMMs have been constructed and refined for the purpose of protein structure prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Eddy, S. Profile hidden Markov models. Bioinformatics, 14:755–763.

    Google Scholar 

  2. Madera, M. et al. (2004). The SUPERFAMILY database in 2004: additions and improvements. Nucleic Acids Res, 32(90001):235–239.

    Article  Google Scholar 

  3. Rabiner, L. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257–286.

    Google Scholar 

  4. Krogh, A., Larsson, B., von Heijne, G., and Sonnhammer, E. (2001). Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol, 305(3):567–580.

    Article  CAS  PubMed  Google Scholar 

  5. Needleman, S. and Wunsch, C. (1970). A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol, 48(3):443–453.

    Article  CAS  PubMed  Google Scholar 

  6. Durbin, R., Eddy, S., Krogh, A., and Mitchison, G. (1998). Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press.

    Google Scholar 

  7. Kyte, J. and Doolittle, R. (1982). A simple method for displaying the hydropathic character of a protein. J Mol Biol, 157(1):105–132.

    Article  CAS  PubMed  Google Scholar 

  8. Argos, P., Rao, J., and Hargrave, P. (1982). Structural prediction of membrane-bound proteins. Eur J Biochem, 128:565–575.

    Article  CAS  PubMed  Google Scholar 

  9. von Heijne, G. (1990). The signal peptide. J Membr Biol, 115(3):195–201.

    Article  Google Scholar 

  10. von Heijne, G. (1992). Membrane protein structure prediction. Hydrophobicity analysis and the positive-inside rule. J Mol Biol, 225(2):487–494.

    Article  Google Scholar 

  11. Jones, D., Taylor, W., and Thornton, J. (1994). A model recognition approach to the prediction of all-helical membrane protein structure and topology. Biochemistry, 33(10):3038–3049.

    Article  CAS  PubMed  Google Scholar 

  12. Rost, B., Casadio, R., and Fariselli, P. (1996). Refining neural network predictions for helical transmembrane proteins by dynamic programming. Proc Int Conf Intell Syst Mol Biol, 4:192–200.

    CAS  PubMed  Google Scholar 

  13. Yuan, Z., Mattick, J., and Teasdale, R. (2004). SVMtm: support vector machines to predict transmembrane segments. J Comput Chem, 25(5):632–636.

    Article  CAS  PubMed  Google Scholar 

  14. Sonnhammer, E., von Heijne, G., Krogh, A., et al. (1998). A hidden Markov model for predicting transmembrane helices in protein sequences. Proc Int Conf Intell Syst Mol Biol, 6:175–182.

    CAS  PubMed  Google Scholar 

  15. Tusnady, G. and Simon, I. (1998). Principles governing amino acid composition of integral membrane proteins: application to topology prediction. J Mol Biol, 283(2):489–506.

    Article  CAS  PubMed  Google Scholar 

  16. Chow, Y. and Schwartz, R. (1989). The N-Best algorithm: an efficient procedure for finding top N sentence hypotheses. Proceedings of the DARPA Speech and Natural Language Workshop, 199–202.

    Google Scholar 

  17. Kahsay, R., Gao, G., Liao, L., and Journals, O. (2005). An improved hidden Markov model for transmembrane protein detection and topology prediction and its applications to complete genomes. Bioinformatics, 21(9):1853–1858.

    Article  CAS  PubMed  Google Scholar 

  18. Viklund, H. and Elofsson, A. (2004). Best α-helical transmembrane protein topology predictions are achieved using hidden Markov models and evolutionary information. Protein Sci, 13:1908–1917.

    Article  CAS  PubMed  Google Scholar 

  19. Käll, L., Krogh, A., and Sonnhammer, E. (2005). An HMM posterior decoder for sequence feature prediction that includes homology in formation. Bioinformatics, 21(1):i251–i257.

    Article  PubMed  Google Scholar 

  20. Bendtsen, J., Nielsen, H., von Heijne, G., and Brunak, S. (2004). Improved prediction of signal peptides: SignalP 3.0. J Mol Biol, 340(4):783–795.

    Article  PubMed  Google Scholar 

  21. Nielsen, H. and Krogh, A. (1998). Prediction of signal peptides and signal anchors by a hidden Markov model. Proc Int Conf Intell Syst Mol Biol, 6:122–130.

    CAS  PubMed  Google Scholar 

  22. Juncker, A., Willenbrock, H., von Heijne, G., Brunak, S., Nielsen, H., and Krogh, A. (2003). Prediction of lipoprotein signal peptides in Gram-negative bacteria. Protein Sci, 12:1652–1662.

    Article  CAS  PubMed  Google Scholar 

  23. Klee, E. and Ellis, L. (2005). Evaluating eukaryotic secreted protein prediction. BMC Bioinformatics, 6(1):256.

    Article  PubMed  Google Scholar 

  24. Martelli, P.L., Fariselli P., and Casadio, R. (2003). An ENSEMBLE machine learning approach for the prediction of all-alpha membrane proteins. Bioinformatics, 19(Suppl 1):I205–I211.

    Article  PubMed  Google Scholar 

  25. Fariselli, P., Finelli, M., Marchignoli, D., Martelli, P.L., Rossi, I., and Casadio, R. (2003). MaxSubSeq: an algorithm for segment-length optimization. The case study of the transmembrane spanning segments. Bioinformatics, 19:500–505.

    Article  CAS  PubMed  Google Scholar 

  26. Delorenzi, M. and Speed, T. (2002). An HMM model for coiled-coil domains and a comparison with PSSM-based predictions. Bioinformatics, 18(4):617–625.

    Article  CAS  PubMed  Google Scholar 

  27. Kabsch, W. and Sander, C. (1983). How good are predictions of protein secondary structure? Biopolymers, 22:2577–2637.

    Article  CAS  PubMed  Google Scholar 

  28. Heinig, M. and Frishman, D. (2004). STRIDE: a web server for secondary structure assignment from known atomic coordinates of proteins. Nucleic Acids Res, 32:500–502.

    Article  Google Scholar 

  29. Rost, B. and Sander, C. (1993). Prediction of protein secondary structure at better than 70% accuracy. J Mol Biol, 232(2):584–599.

    Article  CAS  PubMed  Google Scholar 

  30. Jones, D. (1999). Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol, 292(2):195–202.

    Article  CAS  PubMed  Google Scholar 

  31. Ward, J., McGuffin, L., Buxton, B., and Jones, D. (2003). Secondary structure prediction with support vector machines. Bioinformatics, 19(13):1650–1655.

    Article  CAS  PubMed  Google Scholar 

  32. Asai, K., Hayamizu, S., and Handa, K. (1993). Prediction of protein secondary structure by the hidden Markov model. Bioinformatics, 9:141–146.

    Article  CAS  Google Scholar 

  33. Zemla, A., Venclovas, C., Moult, J., and Fidelis, K. (2001). Processing and evaluation of predictions in CASP 4. Proteins, 45(Suppl 5):13–21.

    Article  Google Scholar 

  34. Stultz, C. (1993). Structural analysis based on state-space modeling. Protein Sci, 2(3):305–314.

    Article  CAS  PubMed  Google Scholar 

  35. Bienkowska, J., He, H., and Smith, T. (2001). Automatic pattern embedding in protein structure models. Intelligent Systems, IEEE [see also IEEE Expert], 16(6):21–25.

    Article  Google Scholar 

  36. Rooman, M.J., Kocher, J.P., and Wodak, S.J. (1991). Prediction of protein backbone conformation based on seven structure assignments. Influence of local interactions. J Mol Biol, 221(3):961–979.

    Article  CAS  PubMed  Google Scholar 

  37. de Brevern, A.G., Valadie, H., Hazout, S., and Etchebest, C. (2002). Extension of a local backbone description using a structural alphabet: a new approach to the sequence-structure relationship. Protein Sci, 11:2871–2886.

    Article  PubMed  Google Scholar 

  38. Bystroff, C. and Baker, D. (1998). Prediction of local structure in proteins using a library of sequence-structure motifs. J Mol Biol, 281(3):565–577.

    Article  CAS  PubMed  Google Scholar 

  39. Unger, R., Harel, D., Wherland, S., and Sussman, J. (1989). A 3D building blocks approach to analyzing and predicting structure of proteins. Proteins, 5:355–373.

    Article  CAS  PubMed  Google Scholar 

  40. Camproux, A., Tuffery, P., Chevrolat, J., Boisvieux, J., and Hazout, S. (1999). Hidden Markov model approach for identifying the modular framework of the protein backbone. Protein Eng, 12(12):1063–1073.

    Article  CAS  PubMed  Google Scholar 

  41. Kent, J. T. and Hamelryck, T. (2005). Using the Fisher-Bingham distribution in stochastic models for protein structure. In Barber, S., Baxter, P. D., V.Mardia, K., and Walls, R. E., editors, Proceedings of the 24th LASR Workshop, 57–60. Leeds University Press.

    Google Scholar 

  42. Hamelryck, T., Kent, JT, Krogh, A. (2006) Sampling realistic protein conformations using local structural bias. PLoS J Comput Biol., 2(9):e131.

    Article  Google Scholar 

  43. Bystroff, C., Thorsson, V., and Baker, D. (2000). HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins. J Mol Biol, 301(1):173–190.

    Article  CAS  PubMed  Google Scholar 

  44. Bystroff, C. and Shao, Y. (2002). Fully automated ab initio protein structure prediction using I-SITES, HMMSTR and ROSETTA. Bioinformatics, 18(1):54–61.

    Google Scholar 

  45. Shao, Y. and Bystroff, C. (2003). Predicting interresidue contacts using templates and pathways. Proteins, 53(Supple 6):497–502.

    Article  CAS  PubMed  Google Scholar 

  46. Zahn, R., Liu, A., Luhrs, T., Riek, R., von Schroetter, C., Garcia, F., Billeter, M., Calzolai, L., Wider, G., and Wuthrich, K. (2000). NMR solution structure of the human prion protein. Proc Natl Acad Sci USA, 97(1):145–150.

    Article  CAS  PubMed  Google Scholar 

  47. Knaus, K., Morillas, M., Swietnicki, W., Malone, M., Surewicz, W., and Yee, V. (2001). Crystal structure of the human prion protein reveals a mechanism for oligomerization. Nat Struct Biol, 8:770–774.

    Article  CAS  PubMed  Google Scholar 

  48. Kovacs, G., Trabattoni, G., Hainfellner, J., Ironside, J., Knight, R., and Budka, H. (2002). Mutations of the prion protein gene. J Neurol, 249(11):1567–1582.

    Article  CAS  PubMed  Google Scholar 

  49. Bateman, A., Coin, L., Durbin, R., Finn, R.D., Hollich, V., Griffiths-Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E.L., Studholme, D.J., Yeats, C., and Eddy, S.R. (2004). The Pfam protein families database. Nucleic Acids Res 32: D138–D141.

    Google Scholar 

  50. Karplus, K., Sjoelander, K., Barrett, C., Cline, M., Haussler, D., Hughey, R., Holm, L., and Sander, C. (1997). Predicting protein structure using hidden Markov models. Proteins, 29(Suppl 1):134–139.

    Article  Google Scholar 

  51. Tsigelny, I., Sharikov, Y., and Ten Eyck, L. (2002). Hidden Markov Models-based system (HMMSPECTR) for detecting structural homologies on the basis of sequential information. Protein Eng, 15(5):347–352.

    Article  CAS  PubMed  Google Scholar 

  52. Krogh, A., Brown, M., Mian, I. S., Sjölander, K., and Haussler, D. (1994). Hidden Markov Models in computational biology: applications to protein modeling. J Mol Biol., 235:1501–1531.

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgments

This material is based upon work supported by the National Science Foundation under Grant DBI-0448072 to C.B.

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Humana Press Inc

About this protocol

Cite this protocol

Bystroff, C., Krogh, A. (2008). Hidden Markov Models for Prediction of Protein Features. In: Zaki, M.J., Bystroff, C. (eds) Protein Structure Prediction. Methods in Molecular Biology™, vol 413. Humana Press. https://doi.org/10.1007/978-1-59745-574-9_7

Download citation

  • DOI: https://doi.org/10.1007/978-1-59745-574-9_7

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-58829-752-5

  • Online ISBN: 978-1-59745-574-9

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics