An Introduction to Protein Contact Prediction

  • Nicholas Hamilton
  • Thomas Huber
Part of the Methods in Molecular Biology™ book series (MIMB, volume 453)


A fundamental problem in molecular biology is the prediction of the three-dimensional structure of a protein from its amino acid sequence. However, molecular modeling to find the structure is at present intractable and is likely to remain so for some time, hence intermediate steps such as predicting which residues pairs are in contact have been developed. Predicted contact pairs have been used for fold prediction, as an initial condition or constraint for molecular modeling, and as a filter to rank multiple models arising from homology modeling. As contact prediction has advanced it is becoming more common for 3D structure predictors to integrate contact prediction into structure building, as this often gives information that is orthogonal to that produced by other methods. This chapter shows how evolutionary information contained in protein sequences and multiple sequence alignments can be used to predict protein structure, and the state-of-the-art predictors and their methodologies are reviewed.

Key words

Protein structure prediction contact prediction contact map multiple sequence alignments CASP 



The authors gratefully acknowledge financial support from the University of Queensland, the ARC Australian Centre for Bio-informatics and the Institute for Molecular Bioscience. The first author would also like to acknowledge the support of Prof. Kevin Burrage's Australian Federation Fellowship.


  1. 1.
    Gobel, U., Sander, C, Scheider, R., et al. (1994) Correlated mutations and residue contacts in proteins. Proteins 18, 309–317.PubMedCrossRefGoogle Scholar
  2. 2.
    McLachlan, A.D. (1971) Tests for comparing related amino acid sequences. J Mol Biol 61, 409–424.PubMedCrossRefGoogle Scholar
  3. 3.
    Neher, E. (1994) How frequent are correlated changes in families of protein sequences? Proc Natl Acad Sci USA 91(1), 98–102.PubMedCrossRefGoogle Scholar
  4. 4.
    Vicatos, S., Reddy, B.V.B., and Kaznes-sis, Y. (2005) Prediction of distant residue contacts with the use of evolutionary information. Proteins: Structure, Function, and Bioinformatics 58, 935–949.CrossRefGoogle Scholar
  5. 5.
    Singer, M.S., Vriend, G., and Bywater, R.P. (2002) Prediction of protein residue contacts with a PDB-derived likelihood matrix. Protein Eng l5(9), 721–725.CrossRefGoogle Scholar
  6. 6.
    Lin, K., Kleinjung, J., Taylor, W., et al. (2003) Testing homology with CAO: A contact-based Markov model of protein evolution. Comp Biol Chem 27, 93–102.CrossRefGoogle Scholar
  7. 7.
    Clarke, N.D. (1995) Covariation of residues in the homeodomain sequence family. Protein Sci. 7(11), 2269–78.CrossRefGoogle Scholar
  8. 8.
    Korber, B.T.M., Farber, R.M., Wolpert, D.H., et al. (1993) Covariation of Mutations in the V3 Loop of Human Immunodeficiency Virus Type 1 Envelope Protein: An Information Theoretic Analysis. Proc Natl Acad Sci 90, 7176–7180.PubMedCrossRefGoogle Scholar
  9. 9.
    Martin, L.C., Gloor, G.B., Dunn, S.D., et al. (2005) Using information theory to search for co-evolving residues in proteins. Bioinformatics 21(22), 4116–4124.PubMedCrossRefGoogle Scholar
  10. 10.
    Oliveira, L., Paiva, A.C.M., and Vriend, G. (2002) Correlated Mutation Analyses on Very Large Sequence Families. Chem Bio Chem 3(10), 1010–1017.PubMedGoogle Scholar
  11. 11.
    Akmaev, V.R., Kelley, S.T., and Stormo, G.D. (2000) Phylogenetically enhanced statistical tools for RNA structure prediction. Bioinformatics 16(6), 501–512.PubMedCrossRefGoogle Scholar
  12. 12.
    Tillier, E.R.M. and Lui, T.W.H. (2003) Using multiple interdependency to separate functional from phylogenetic correlations in protein alignments. Bioinformatics 19(6), 750–755.PubMedCrossRefGoogle Scholar
  13. 13.
    Wollenberg, K.R., and Atchley, W.R. (2000) Separation of phylogenetic and functional associations in biological sequences by using the parametric bootstrap. Proc Natl Acad Sci USA 97, 3288–3291.PubMedCrossRefGoogle Scholar
  14. 14.
    McGuffin, L.J., Bryson, K., and Jones, D.T (2000) The PSIPRED protein structure prediction server. Bioinformatics 16, 404–405.PubMedCrossRefGoogle Scholar
  15. 15.
    Shapire, R.E., The boosting approach to machine learning: An overview. MSRI Workshop on Nonlinear Estimation and Classification. 2002: Springer.Google Scholar
  16. 16.
    Haykin, S., Neural Networks. 2nd ed. 1999: Prentice Hall. 104Google Scholar
  17. 17.
    Zell, A., Marnier, M., Vogt, N., et al, Stuttgart Neural Network Simulator User Manual Version 4.2. 1998: University of Stuttgart.Google Scholar
  18. 18.
    Punta, M., and Rost, B. (2005) PROFcon: novel prediction of long range contacts. Bioinformatics 21(13),2960–2968.PubMedCrossRefGoogle Scholar
  19. 19.
    Hamilton, N., Burrage, K, Ragan, M.A., et al. (2004) Protein contact prediction using patterns of correlation. Proteins: Structure, Function, and Bioinformatics 56, 679–684.CrossRefGoogle Scholar
  20. 20.
    Fariselli, P., Olmea, O., Valencia, A., et al. (2001) Prediction of contact maps with neural networks and correlated mutations. Protein Eng 14, 835–843.PubMedCrossRefGoogle Scholar
  21. 21.
    MacCallum, R.M. (2004) Stripped sheets and protein contact prediction. Bioinformatics 20(1), i224–i231.PubMedCrossRefGoogle Scholar
  22. 22.
    Cortes, C, and Vapnik, V. (1995) Support vector network. Machine and learning 20, 273–297.Google Scholar
  23. 23.
    Boser, B., Guyon, I., and Vapnik, V. A training algorithm for optimal margin classifiers. in Proceedings of the fifth annual workshop on computational learning theory. 1992.Google Scholar
  24. 24.
    Chang, C-C, and Lin, C-J, LIBSVM: a library for support vector machines. Software available at tw/ cjlin/libsvm. 2001.Google Scholar
  25. 25.
    Koski, T., Hidden Markov Models for Bioinformatics. 2002: Springer.Google Scholar
  26. 26.
    Karplus, K, Karchin, R., Draper, J., et al. (2003) Combining local-structure, fold-recognition, and new-fold methods for protein structure prediction. Proteins: Structure, Function, and Genetics 53(S6), 491–496.CrossRefGoogle Scholar
  27. 27.
    Shao, Y. and Bystroff, C. (2003) Predicting Interresidue contacts using templates and pathways. Proteins 53, 497–502.PubMedCrossRefGoogle Scholar
  28. 28.
    Conrad, C, Erfle, H., Warnat, P., et al. (2004) Automatic Identification of Subcel-lular Phenotypes on Human Cell Arrays. Genome Research 14, 1130–1136.PubMedCrossRefGoogle Scholar
  29. 29.
    Tsai, C-H, Chen, B-J, Chan, C-h, et al. (2005) Improving disulphide connectivity prediction with sequential distance between oxidized cysteines. Bioinformatics 21(4), 4416–4419.PubMedCrossRefGoogle Scholar
  30. 30.
    Hu, J., Shen, X., Shao, Y., et al., eds. Mining protein contact maps. In 2nd BIOKDD Workshop on Data Mining in Bioinformatics. 2002.Google Scholar
  31. 31.
    Yuan, Z. (2005) Better prediction of protein contact number using a support vector regression analysis if amino acid sequence. BMC Bioinformatics 6, 248–257.PubMedCrossRefGoogle Scholar
  32. 32.
    Aloy, P., Stark, A., Hadley, C, et al. (2003) Predictions without templates: new folds, secondary structure, and contacts in CASP5. Proteins Suppl. 6, 436–456.CrossRefGoogle Scholar
  33. 33.
    Olmea, O., and Valencia, A. (1997) Improving contact predictions by the combination of correlated mutations and other sources of sequence information. Fold Design 2, S25–S32.CrossRefGoogle Scholar
  34. 34.
    Mirny, L. and Domany, E. (1996) Protein Fold Recognition and Dynamics in The Space of Contact Maps. Proteins 26, 319–410.CrossRefGoogle Scholar
  35. 35.
    Fariselli, P., Olmea, O., Valencia, A., et al. (2001) Progress in predicting inter-residue contacts of proteins with neural networks and correlated mutations. Proteins Suppl 5,157–162.PubMedCrossRefGoogle Scholar
  36. 36.
    Fariselli, P. and Casadio, R. (1999) Neural network based prediction of residue contacts in protein. Protein Eng 12, 15–21.PubMedCrossRefGoogle Scholar
  37. 37.
    Grana, O., Baker, D., Maccallum, R.M., et al. (2005) CASP6 assessment of contact prediction. Proteins: Structure, Function, and Bioinformatics 61 Suppl 7, 214–24.CrossRefGoogle Scholar
  38. 38.
    Koh, I.Y.Y., Eyrich, V.A., Marti-Renom, M.A., et al. (2003) EVA: evaluation of protein structure prediction servers. Nucleic Acids Research 31, 3311–3315.PubMedCrossRefGoogle Scholar
  39. 39.
    Pazos, F., Helmer-Citterich, M., and Aus-iello, G. (1997) Correlated mutations contain information about protein-protein interaction. J Mol Biol 271, 511–523.PubMedCrossRefGoogle Scholar
  40. 40.
    Rychlewski, L., and Fischer, D. (2005) LiveBench-8: The large-scale, continuous assessment of automated protein structure prediction. Protein Science 14, 240–245.PubMedCrossRefGoogle Scholar
  41. 41.
    Pollastri, G. and Baldi, P. (2002) Prediction of contact maps by GIOHMMs and recurrent neural networks using lateral propagation from all four cardinal corners. Bioinformatics 18(Suppl. 1), S62–S70.PubMedGoogle Scholar
  42. 42.
    Kohonen, T., and Makisari, K. (1989) The self-organizing feature maps. Phys Scripta 39, 168–172.CrossRefGoogle Scholar
  43. 43.
    Andreeva, A., Howorth, D., Brenner, S.E., et al. (2004) SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Research 32(Database issue), D226–9.PubMedCrossRefGoogle Scholar
  44. 44.
    Zhang, Y., Arakaki, A.K., and Skolnick, J. (2005) TASSER: An automated method for the prediction of protein tertiary structures. Protein Structure, Function, and Bioinformatics Suppl. 7, 91–98.CrossRefGoogle Scholar
  45. 45.
    Kim, D.E., Chivian, D., and Baker, D. (2004) Protein structure prediction and analysis using the Robetta server. Nucleic Acids Research 32, W526–W531.PubMedCrossRefGoogle Scholar

Copyright information

© Humana Press, a part of Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Nicholas Hamilton
    • 1
  • Thomas Huber
    • 2
  1. 1.ARC Centre of Excellence in Bioinformatics, Institute for Molecular Bioscience and Advanced Computational Modelling CentreThe University of QueenslandBrisbaneAustralia
  2. 2.School of Molecular and Microbial Sciences and Australian Institute for Bioengineering and NanotechnologyThe University of QueenslandBrisbaneAustralia

Personalised recommendations