Hidden Markov Models for Prediction of Protein Features

  • Christopher Bystroff
  • Anders Krogh
Part of the Methods in Molecular Biology™ book series (MIMB, volume 413)


Hidden Markov Models (HMMs) are an extremely versatile statistical representation that can be used to model any set of one-dimensional discrete symbol data. HMMs can model protein sequences in many ways, depending on what features of the protein are represented by the Markov states. For protein structure prediction, states have been chosen to represent either homologous sequence positions, local or secondary structure types, or transmembrane locality. The resulting models can be used to predict common ancestry, secondary or local structure, or membrane topology by applying one of the two standard algorithms for comparing a sequence to a model. In this chapter, we review those algorithms and discuss how HMMs have been constructed and refined for the purpose of protein structure prediction.


Transmembrane local motif Viterbi Baum–Welch profile topology folding 



This material is based upon work supported by the National Science Foundation under Grant DBI-0448072 to C.B.


  1. 1.
    Eddy, S. Profile hidden Markov models. Bioinformatics, 14:755–763.Google Scholar
  2. 2.
    Madera, M. et al. (2004). The SUPERFAMILY database in 2004: additions and improvements. Nucleic Acids Res, 32(90001):235–239.CrossRefGoogle Scholar
  3. 3.
    Rabiner, L. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257–286.Google Scholar
  4. 4.
    Krogh, A., Larsson, B., von Heijne, G., and Sonnhammer, E. (2001). Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol, 305(3):567–580.CrossRefPubMedGoogle Scholar
  5. 5.
    Needleman, S. and Wunsch, C. (1970). A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol, 48(3):443–453.CrossRefPubMedGoogle Scholar
  6. 6.
    Durbin, R., Eddy, S., Krogh, A., and Mitchison, G. (1998). Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press.Google Scholar
  7. 7.
    Kyte, J. and Doolittle, R. (1982). A simple method for displaying the hydropathic character of a protein. J Mol Biol, 157(1):105–132.CrossRefPubMedGoogle Scholar
  8. 8.
    Argos, P., Rao, J., and Hargrave, P. (1982). Structural prediction of membrane-bound proteins. Eur J Biochem, 128:565–575.CrossRefPubMedGoogle Scholar
  9. 9.
    von Heijne, G. (1990). The signal peptide. J Membr Biol, 115(3):195–201.CrossRefGoogle Scholar
  10. 10.
    von Heijne, G. (1992). Membrane protein structure prediction. Hydrophobicity analysis and the positive-inside rule. J Mol Biol, 225(2):487–494.CrossRefGoogle Scholar
  11. 11.
    Jones, D., Taylor, W., and Thornton, J. (1994). A model recognition approach to the prediction of all-helical membrane protein structure and topology. Biochemistry, 33(10):3038–3049.CrossRefPubMedGoogle Scholar
  12. 12.
    Rost, B., Casadio, R., and Fariselli, P. (1996). Refining neural network predictions for helical transmembrane proteins by dynamic programming. Proc Int Conf Intell Syst Mol Biol, 4:192–200.PubMedGoogle Scholar
  13. 13.
    Yuan, Z., Mattick, J., and Teasdale, R. (2004). SVMtm: support vector machines to predict transmembrane segments. J Comput Chem, 25(5):632–636.CrossRefPubMedGoogle Scholar
  14. 14.
    Sonnhammer, E., von Heijne, G., Krogh, A., et al. (1998). A hidden Markov model for predicting transmembrane helices in protein sequences. Proc Int Conf Intell Syst Mol Biol, 6:175–182.PubMedGoogle Scholar
  15. 15.
    Tusnady, G. and Simon, I. (1998). Principles governing amino acid composition of integral membrane proteins: application to topology prediction. J Mol Biol, 283(2):489–506.CrossRefPubMedGoogle Scholar
  16. 16.
    Chow, Y. and Schwartz, R. (1989). The N-Best algorithm: an efficient procedure for finding top N sentence hypotheses. Proceedings of the DARPA Speech and Natural Language Workshop, 199–202.Google Scholar
  17. 17.
    Kahsay, R., Gao, G., Liao, L., and Journals, O. (2005). An improved hidden Markov model for transmembrane protein detection and topology prediction and its applications to complete genomes. Bioinformatics, 21(9):1853–1858.CrossRefPubMedGoogle Scholar
  18. 18.
    Viklund, H. and Elofsson, A. (2004). Best α-helical transmembrane protein topology predictions are achieved using hidden Markov models and evolutionary information. Protein Sci, 13:1908–1917.CrossRefPubMedGoogle Scholar
  19. 19.
    Käll, L., Krogh, A., and Sonnhammer, E. (2005). An HMM posterior decoder for sequence feature prediction that includes homology in formation. Bioinformatics, 21(1):i251–i257.CrossRefPubMedGoogle Scholar
  20. 20.
    Bendtsen, J., Nielsen, H., von Heijne, G., and Brunak, S. (2004). Improved prediction of signal peptides: SignalP 3.0. J Mol Biol, 340(4):783–795.CrossRefPubMedGoogle Scholar
  21. 21.
    Nielsen, H. and Krogh, A. (1998). Prediction of signal peptides and signal anchors by a hidden Markov model. Proc Int Conf Intell Syst Mol Biol, 6:122–130.PubMedGoogle Scholar
  22. 22.
    Juncker, A., Willenbrock, H., von Heijne, G., Brunak, S., Nielsen, H., and Krogh, A. (2003). Prediction of lipoprotein signal peptides in Gram-negative bacteria. Protein Sci, 12:1652–1662.CrossRefPubMedGoogle Scholar
  23. 23.
    Klee, E. and Ellis, L. (2005). Evaluating eukaryotic secreted protein prediction. BMC Bioinformatics, 6(1):256.CrossRefPubMedGoogle Scholar
  24. 24.
    Martelli, P.L., Fariselli P., and Casadio, R. (2003). An ENSEMBLE machine learning approach for the prediction of all-alpha membrane proteins. Bioinformatics, 19(Suppl 1):I205–I211.CrossRefPubMedGoogle Scholar
  25. 25.
    Fariselli, P., Finelli, M., Marchignoli, D., Martelli, P.L., Rossi, I., and Casadio, R. (2003). MaxSubSeq: an algorithm for segment-length optimization. The case study of the transmembrane spanning segments. Bioinformatics, 19:500–505.CrossRefPubMedGoogle Scholar
  26. 26.
    Delorenzi, M. and Speed, T. (2002). An HMM model for coiled-coil domains and a comparison with PSSM-based predictions. Bioinformatics, 18(4):617–625.CrossRefPubMedGoogle Scholar
  27. 27.
    Kabsch, W. and Sander, C. (1983). How good are predictions of protein secondary structure? Biopolymers, 22:2577–2637.CrossRefPubMedGoogle Scholar
  28. 28.
    Heinig, M. and Frishman, D. (2004). STRIDE: a web server for secondary structure assignment from known atomic coordinates of proteins. Nucleic Acids Res, 32:500–502.CrossRefGoogle Scholar
  29. 29.
    Rost, B. and Sander, C. (1993). Prediction of protein secondary structure at better than 70% accuracy. J Mol Biol, 232(2):584–599.CrossRefPubMedGoogle Scholar
  30. 30.
    Jones, D. (1999). Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol, 292(2):195–202.CrossRefPubMedGoogle Scholar
  31. 31.
    Ward, J., McGuffin, L., Buxton, B., and Jones, D. (2003). Secondary structure prediction with support vector machines. Bioinformatics, 19(13):1650–1655.CrossRefPubMedGoogle Scholar
  32. 32.
    Asai, K., Hayamizu, S., and Handa, K. (1993). Prediction of protein secondary structure by the hidden Markov model. Bioinformatics, 9:141–146.CrossRefGoogle Scholar
  33. 33.
    Zemla, A., Venclovas, C., Moult, J., and Fidelis, K. (2001). Processing and evaluation of predictions in CASP 4. Proteins, 45(Suppl 5):13–21.CrossRefGoogle Scholar
  34. 34.
    Stultz, C. (1993). Structural analysis based on state-space modeling. Protein Sci, 2(3):305–314.CrossRefPubMedGoogle Scholar
  35. 35.
    Bienkowska, J., He, H., and Smith, T. (2001). Automatic pattern embedding in protein structure models. Intelligent Systems, IEEE [see also IEEE Expert], 16(6):21–25.CrossRefGoogle Scholar
  36. 36.
    Rooman, M.J., Kocher, J.P., and Wodak, S.J. (1991). Prediction of protein backbone conformation based on seven structure assignments. Influence of local interactions. J Mol Biol, 221(3):961–979.CrossRefPubMedGoogle Scholar
  37. 37.
    de Brevern, A.G., Valadie, H., Hazout, S., and Etchebest, C. (2002). Extension of a local backbone description using a structural alphabet: a new approach to the sequence-structure relationship. Protein Sci, 11:2871–2886.CrossRefPubMedGoogle Scholar
  38. 38.
    Bystroff, C. and Baker, D. (1998). Prediction of local structure in proteins using a library of sequence-structure motifs. J Mol Biol, 281(3):565–577.CrossRefPubMedGoogle Scholar
  39. 39.
    Unger, R., Harel, D., Wherland, S., and Sussman, J. (1989). A 3D building blocks approach to analyzing and predicting structure of proteins. Proteins, 5:355–373.CrossRefPubMedGoogle Scholar
  40. 40.
    Camproux, A., Tuffery, P., Chevrolat, J., Boisvieux, J., and Hazout, S. (1999). Hidden Markov model approach for identifying the modular framework of the protein backbone. Protein Eng, 12(12):1063–1073.CrossRefPubMedGoogle Scholar
  41. 41.
    Kent, J. T. and Hamelryck, T. (2005). Using the Fisher-Bingham distribution in stochastic models for protein structure. In Barber, S., Baxter, P. D., V.Mardia, K., and Walls, R. E., editors, Proceedings of the 24th LASR Workshop, 57–60. Leeds University Press.Google Scholar
  42. 42.
    Hamelryck, T., Kent, JT, Krogh, A. (2006) Sampling realistic protein conformations using local structural bias. PLoS J Comput Biol., 2(9):e131.CrossRefGoogle Scholar
  43. 43.
    Bystroff, C., Thorsson, V., and Baker, D. (2000). HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins. J Mol Biol, 301(1):173–190.CrossRefPubMedGoogle Scholar
  44. 44.
    Bystroff, C. and Shao, Y. (2002). Fully automated ab initio protein structure prediction using I-SITES, HMMSTR and ROSETTA. Bioinformatics, 18(1):54–61.Google Scholar
  45. 45.
    Shao, Y. and Bystroff, C. (2003). Predicting interresidue contacts using templates and pathways. Proteins, 53(Supple 6):497–502.CrossRefPubMedGoogle Scholar
  46. 46.
    Zahn, R., Liu, A., Luhrs, T., Riek, R., von Schroetter, C., Garcia, F., Billeter, M., Calzolai, L., Wider, G., and Wuthrich, K. (2000). NMR solution structure of the human prion protein. Proc Natl Acad Sci USA, 97(1):145–150.CrossRefPubMedGoogle Scholar
  47. 47.
    Knaus, K., Morillas, M., Swietnicki, W., Malone, M., Surewicz, W., and Yee, V. (2001). Crystal structure of the human prion protein reveals a mechanism for oligomerization. Nat Struct Biol, 8:770–774.CrossRefPubMedGoogle Scholar
  48. 48.
    Kovacs, G., Trabattoni, G., Hainfellner, J., Ironside, J., Knight, R., and Budka, H. (2002). Mutations of the prion protein gene. J Neurol, 249(11):1567–1582.CrossRefPubMedGoogle Scholar
  49. 49.
    Bateman, A., Coin, L., Durbin, R., Finn, R.D., Hollich, V., Griffiths-Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E.L., Studholme, D.J., Yeats, C., and Eddy, S.R. (2004). The Pfam protein families database. Nucleic Acids Res 32: D138–D141.Google Scholar
  50. 50.
    Karplus, K., Sjoelander, K., Barrett, C., Cline, M., Haussler, D., Hughey, R., Holm, L., and Sander, C. (1997). Predicting protein structure using hidden Markov models. Proteins, 29(Suppl 1):134–139.CrossRefGoogle Scholar
  51. 51.
    Tsigelny, I., Sharikov, Y., and Ten Eyck, L. (2002). Hidden Markov Models-based system (HMMSPECTR) for detecting structural homologies on the basis of sequential information. Protein Eng, 15(5):347–352.CrossRefPubMedGoogle Scholar
  52. 52.
    Krogh, A., Brown, M., Mian, I. S., Sjölander, K., and Haussler, D. (1994). Hidden Markov Models in computational biology: applications to protein modeling. J Mol Biol., 235:1501–1531.CrossRefPubMedGoogle Scholar

Copyright information

© Humana Press Inc 2008

Authors and Affiliations

  • Christopher Bystroff
  • Anders Krogh

There are no affiliations available

Personalised recommendations