Abstract
This chapter aims to introduce to the specifics of protein structure annotations and their fundamental position in structural bioinformatics, bioinformatics in general. Proteins are profoundly characterised by their structure in every aspect of their functioning and, while over the last decades there has been a close to exponential growth of known protein sequences, the growth of known protein structures has been closer to linear because of the high complexity and cost of determining them. Thus, protein structure predictors are among the most thoroughly assessed tools in bioinformatics (in venues such as CASP or CAMEO) because they allow the structural study of proteins on a large scale. This chapter presents the key types of protein structure annotation and the methods and algorithms for predicting them, with the aim to give both a historical perspective on their development and a snapshot of their current state of the art. From one-dimensional protein annotations – i.e. secondary structure, solvent accessibility and torsion angles – to more complex and informative two-dimensional protein abstractions, i.e. contact maps, both mature and currently developing methods for protein structure annotations are introduced. The aim of this overview is to facilitate the adoption and development of state-of-the-art protein structural predictors. Particular attention is given to some of the best performing and freely available web servers and standalone programmes to predict protein structure annotations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Adhikari B, Hou J, Cheng J (2017) DNCON2: improved protein contact prediction using two-level deep convolutional neural networks. Bioinformatics 34(9):1466–1472
Ahmad S, Gromiha M, Fawareh H, Sarai A (2004) ASAView: database and tool for solvent accessibility representation in proteins. BMC Bioinformatics 5:51
Aloy P, Stark A, Hadley C, Russell RB (2003) Predictions without templates: new folds, secondary structure, and contacts in CASP5. Proteins Struct Funct Bioinforma 53(S6):436–456
Altschul SF et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402
Baldi P, Brunak S, Frasconi P, Soda G, Pollastri G (1999) Exploiting the past and the future in protein secondary structure prediction. Bioinforma Oxf Engl 15(11):937–946
Bartoli L, Capriotti E, Fariselli P, Martelli PL, Casadio R (2008) The pros and cons of predicting protein contact maps. Methods Mol Biol Clifton NJ 413:199–217
Baú D, Martin AJ, Mooney C, Vullo A, Walsh I, Pollastri G (2006) Distill: a suite of web servers for the prediction of one-, two- and three-dimensional structural features of proteins. BMC Bioinformatics 7:402
Berman HM et al (2000) The Protein Data Bank. Nucleic Acids Res 28(1):235–242
Buchan DWA, Jones DT (2018) Improved protein contact predictions with the MetaPSICOV2 server in CASP12. Proteins 86(Suppl 1):78–83
Buchan DWA, Ward SM, Lobley AE, Nugent TCO, Bryson K, Jones DT (2010) Protein annotation and modelling servers at University College London. Nucleic Acids Res 38(suppl_2):W563–W568
Bystroff C, Thorsson V, Baker D (2000) HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins. J Mol Biol 301(1):173–190
Cheng J, Baldi P (2007) Improved residue contact prediction using support vector machines and a large feature set. BMC Bioinformatics 8:113
Cheng J, Randall AZ, Sweredoski MJ, Baldi P (2005) SCRATCH: a protein structure and structural feature prediction server. Nucleic Acids Res 33(suppl_2):W72–W76
Chou PY, Fasman GD (1974) Prediction of protein conformation. Biochemistry (Mosc) 13(2):222–245
Cornette JL, Cease KB, Margalit H, Spouge JL, Berzofsky JA, DeLisi C (1987) Hydrophobicity scales and computational techniques for detecting amphipathic structures in proteins. J Mol Biol 195(3):659–685
Cuff JA, Clamp ME, Siddiqui AS, Finlay M, Barton GJ (1998) JPred: a consensus secondary structure prediction server. Bioinformatics 14(10):892–893
De Brevern AG, Etchebest C, Hazout S (2000) Bayesian probabilistic approach for predicting backbone structures in terms of protein blocks. Proteins Struct Funct Bioinforma 41(3):271–287
Di Lena P, Fariselli P, Margara L, Vassura M, Casadio R (2010) Fast overlapping of protein contact maps by alignment of eigenvectors. Bioinformatics 26(18):2250–2258
Di Lena P, Fariselli P, Margara L, Vassura M, Casadio R (2011) Is there an optimal substitution matrix for contact prediction with correlated mutations? IEEEACM Trans Comput Biol Bioinforma 8(4):1017–1028
Di Lena P, Nagata K, Baldi P (2012) Deep architectures for protein contact map prediction. Bioinformatics 28(19):2449–2457
Drozdetskiy A, Cole C, Procter J, Barton GJ (2015) JPred4: a protein secondary structure prediction server. Nucleic Acids Res 43(W1):W389–W394
Eickholt J, Cheng J (2012) Predicting protein residue–residue contacts using deep networks and boosting. Bioinformatics 28(23):3066–3072
Faraggi E, Yang Y, Zhang S, Zhou Y (2009) Predicting continuous local structure and the effect of its substitution for secondary structure in fragment-free protein structure prediction. Structure 17(11):1515–1527
Fariselli P, Olmea O, Valencia A, Casadio R (2001) Prediction of contact maps with neural networks and correlated mutations. Protein Eng Des Sel 14(11):835–843
Fauchère JL, Charton M, Kier LB, Verloop A, Pliska V (1988) Amino acid side chain parameters for correlation studies in biology and pharmacology. Int J Pept Protein Res 32(4):269–278
Finn RD, Clements J, Eddy SR (2011) HMMER web server: interactive sequence similarity searching. Nucleic Acids Res 39(Web Server issue):W29–W37
Göbel U, Sander C, Schneider R, Valencia A (1994) Correlated mutations and residue contacts in proteins. Proteins 18(4):309–317
Haas J et al Continuous Automated Model Evaluation (CAMEO) complementing the critical assessment of structure prediction in CASP12. Proteins: Struct Funct Bioinf p. n/a-n/a
Heffernan R et al (2015) Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning. Sci Rep 5:11476
Heffernan R et al (2016) Highly accurate sequence-based prediction of half-sphere exposures of amino acid residues in proteins. Bioinformatics 32(6):843–849
Heffernan R, Yang Y, Paliwal K, Zhou Y (2017) Capturing non-local interactions by long short-term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility. Bioinformatics 33(18):2842–2849
Holbrook SR, Muskal SM, Kim SH (1990) Predicting surface exposure of amino acids from protein sequence. Protein Eng 3(8):659–665
Huang Y, Bystroff C (2006) Improved pairwise alignments of proteins in the Twilight Zone using local structure predictions. Bioinformatics 22(4):413–422
Johnson LS, Eddy SR, Portugaly E (2010) Hidden Markov model speed heuristic and iterative HMM search procedure. BMC Bioinformatics 11:431
Jones DT (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 292(2):195–202
Jones DT, Swindells MB (2002) Getting the most from PSI–BLAST. Trends Biochem Sci 27(3):161–164
Jones DT, Buchan DWA, Cozzetto D, Pontil M (2012) PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics 28(2):184–190
Jones DT, Singh T, Kosciolek T, Tetchner S (2015) MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins. Bioinformatics 31(7):999–1006
Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22(12):2577–2637
Kaján L, Hopf TA, Kalaš M, Marks DS, Rost B (2014) FreeContact: fast and free software for protein contact prediction from residue co-evolution. BMC Bioinformatics 15:85
Kendrew JC et al (1960) Structure of myoglobin: a three-dimensional Fourier synthesis at 2 A. resolution. Nature 185(4711):422–427
Kim DE, DiMaio F, Wang RY-R, Song Y, Baker D (2014) One contact for every twelve residues allows robust and accurate topology-level protein structure modeling. Proteins 82(2):208–218
Kinch LN, Li W, Monastyrskyy B, Kryshtafovych A, Grishin NV (2016) Assessment of CASP11 contact-assisted predictions. Proteins 84(Suppl 1):164–180
Kosciolek T, Jones DT (2014) De Novo structure prediction of globular proteins aided by sequence variation-derived contacts. PLoS One 9(3):e92197
Kosciolek T, Jones DT (2016) Accurate contact predictions using covariation techniques and machine learning. Proteins 84(Suppl 1):145–151
Kuang R, Leslie CS, Yang A-S (2004) Protein backbone angle prediction with machine learning approaches. Bioinformatics 20(10):1612–1621
Kukic P, Mirabello C, Tradigo G, Walsh I, Veltri P, Pollastri G (2014) Toward an accurate prediction of inter-residue distances in proteins using 2D recursive neural networks. BMC Bioinformatics 15:6
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Lyons J et al (2014) Predicting backbone Cα angles and dihedrals from protein sequences by stacked sparse auto-encoder deep neural network. J Comput Chem 35(28):2040–2046
MacCallum RM (2004) Striped sheets and protein contact prediction. Bioinformatics 20(suppl_1):i224–i231
Magnan CN, Baldi P (2014) SSpro/ACCpro 5: almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity. Bioinformatics 30(18):2592–2597
Martin J, Letellier G, Marin A, Taly J-F, de Brevern AG, Gibrat J-F (2005) Protein secondary structure assignment revisited: a detailed analysis of different assignment methods. BMC Struct Biol 5:17
Martin AJ, Mooney C, Walsh I, Pollastri G (2010) Contact map prediction by machine learning. In: Pan Y, Zomaya A, Rangwala H, Karypis G (eds) Introduction to protein structure prediction. Wiley. https://doi.org/10.1002/9780470882207.ch7
Mirabello C, Pollastri G (2013) Porter, PaleAle 4.0: high-accuracy prediction of protein secondary structure and relative solvent accessibility. Bioinformatics 29(16):2056–2058
Monastyrskyy B, D’Andrea D, Fidelis K, Tramontano A, Kryshtafovych A (2014) Evaluation of residue–residue contact prediction in CASP10. Proteins Struct Funct Bioinforma 82:138–153
Monastyrskyy B, D’Andrea D, Fidelis K, Tramontano A, Kryshtafovych A (2016) New encouraging developments in contact prediction: Assessment of the CASP11 results. Proteins 84(Suppl 1):131–144
Mooney C, Pollastri G (2009) Beyond the twilight zone: automated prediction of structural properties of proteins by recursive neural networks and remote homology information. Proteins Struct Funct Bioinforma 77(1):181–190
Mooney C, Vullo A, Pollastri G (2006) Protein structural motif prediction in multidimensional ø-ψ space leads to improved secondary structure prediction. J Comput Biol 13(8):1489–1502
Mooney C, Cessieux A, Shields DC, Pollastri G (2013) SCL-Epred: a generalised de novo eukaryotic protein subcellular localisation predictor. Amino Acids 45(2):291–299
Murzin AG, Brenner SE, Hubbard T, Chothia C (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 247(4):536–540
Olmea O, Valencia A (1997) Improving contact predictions by the combination of correlated mutations and other sources of sequence information. Fold Des 2:S25–S32
Pascarella S, Persio RD, Bossa F, Argos P (1998) Easy method to predict solvent accessibility from multiple protein sequence alignments. Proteins Struct Funct Bioinforma 32(2):190–199
Pauling L, Corey RB (1951) Configurations of polypeptide chains with favored orientations around single bonds. Proc Natl Acad Sci U S A 37(11):729–740
Pazos F, Helmer-Citterich M, Ausiello G, Valencia A (1997) Correlated mutations contain information about protein-protein interaction. J Mol Biol 271(4):511–523
Perutz MF, Rossmann MG, Cullis AF, Muirhead H, Will G, North AC (1960) Structure of haemoglobin: a three-dimensional Fourier synthesis at 5.5-A. resolution, obtained by X-ray analysis. Nature 185(4711):416–422
Pollastri G, Baldi P (2002) Prediction of contact maps by GIOHMMs and recurrent neural networks using lateral propagation from all four cardinal corners. Bioinformatics 18(suppl_1):S62–S70
Pollastri G, McLysaght A (2005) Porter: a new, accurate server for protein secondary structure prediction. Bioinformatics 21(8):1719–1720
Pollastri G, Baldi P, Fariselli P, Casadio R (2001) Improved prediction of the number of residue contacts in proteins by recurrent neural networks. Bioinformatics 17(suppl_1):S234–S242
Pollastri G, Baldi P, Fariselli P, Casadio R (2002) Prediction of coordination number and relative solvent accessibility in proteins. Proteins 47(2):142–153
Pollastri G, Martin AJ, Mooney C, Vullo A (2007) Accurate prediction of protein secondary structure and solvent accessibility by consensus combiners of sequence and structure information. BMC Bioinformatics 8:201
Remmert M, Biegert A, Hauser A, Söding J (2012) HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Methods 9(2):173–175
Rost B (2001) Review: protein secondary structure prediction continues to rise. J Struct Biol 134(2):204–218
Rost B, Sander C (1993) Prediction of protein secondary structure at better than 70% accuracy. J Mol Biol 232(2):584–599
Rost B, Sander C (1994) Conservation and prediction of solvent accessibility in protein families. Proteins 20(3):216–226
Roy A, Kucukural A, Zhang Y (2010) I-TASSER: a unified platform for automated protein structure and function prediction. Nat Protoc 5(4):725–738
Schaarschmidt J, Monastyrskyy B, Kryshtafovych A, Bonvin AMJJ (2018) Assessment of contact predictions in CASP12: co-evolution and deep learning coming of age. Proteins Struct Funct Bioinforma 86:51–66
Schäffer AA et al (2001) Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic Acids Res 29(14):2994–3005
Schlessinger A, Punta M, Rost B (2007) Natively unstructured regions in proteins identified from contact predictions. Bioinforma Oxf Engl 23(18):2376–2384
Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117
Seemayer S, Gruber M, Söding J (2014) CCMpred—fast and precise prediction of protein residue–residue contacts from correlated mutations. Bioinformatics 30(21):3128–3130
Sims GE, Choi I-G, Kim S-H (2005) Protein conformational space in higher order ϕ-Ψ maps. Proc Natl Acad Sci U S A 102(3):618–621
Tegge AN, Wang Z, Eickholt J, Cheng J (2009) NNcon: improved protein contact map prediction using 2D-recursive neural networks. Nucleic Acids Res 37(suppl_2):W515–W518
The UniProt Consortium (2016) UniProt: the universal protein knowledgebase. Nucleic Acids Res 45(D1):D158–D169
Thomas H (2005) An amino acid has two sides: a new 2D measure provides a different view of solvent exposure. Proteins Struct Funct Bioinforma 59(1):38–48
Ting D, Wang G, Shapovalov M, Mitra R, Jordan MI, Jr RLD (2010) Neighbor-dependent Ramachandran probability distributions of amino acids developed from a Hierarchical Dirichlet process model. PLoS Comput Biol 6(4):e1000763
Torrisi M, Kaleel M, Pollastri G (2018) Porter 5: state-of-the-art ab initio prediction of protein secondary structure in 3 and 8 classes. bioRxiv:289033
Vassura M, Margara L, Di Lena P, Medri F, Fariselli P, Casadio R (2008) Reconstruction of 3D structures from protein contact maps. IEEEACM Trans Comput Biol Bioinforma 5(3):357–367
Vassura M et al (2011) Blurring contact maps of thousands of proteins: what we can learn by reconstructing 3D structure. BioData Min 4:1
Vendruscolo M, Kussell E, Domany E (1997) Recovery of protein structure from contact maps. Fold Des 2(5):295–306
Vullo A, Walsh I, Pollastri G (2006) A two-stage approach for improved prediction of residue contact maps. BMC Bioinformatics 7:180
Walsh I, Baù D, Martin AJ, Mooney C, Vullo A, Pollastri G (2009) Ab initio and template-based prediction of multi-class distance maps by two-dimensional recursive neural networks. BMC Struct Biol 9:5
Walsh I, Pollastri G, Tosatto SCE (2016) Correct machine learning on protein sequences: a peer-reviewing perspective. Brief Bioinform 17(5):831–840
Wang S, Li W, Liu S, Xu J (2016) RaptorX-property: a web server for protein structure property prediction. Nucleic Acids Res 44(W1):W430–W435
Wang S, Sun S, Li Z, Zhang R, Xu J (2017) Accurate de novo prediction of protein contact map by ultra-deep learning model. PLoS Comput Biol 13(1):e1005324
Wang S, Sun S, Xu J (2018) Analysis of deep learning methods for blind protein contact prediction in CASP12. Proteins 86(Suppl 1):67–77
Waterhouse AM, Procter JB, Martin DMA, Clamp M, Barton GJ (2009) Jalview version 2—a multiple sequence alignment editor and analysis workbench. Bioinformatics 25(9):1189–1191
Wood MJ, Hirst JD (2005) Protein secondary structure prediction with dihedral angles. Proteins Struct Funct Bioinforma 59(3):476–481
Xia L, Pan X-M (2000) New method for accurate prediction of solvent accessibility from protein sequence. Proteins Struct Funct Bioinforma 42(1):1–5
Yang Y, Faraggi E, Zhao H, Zhou Y (2011) Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates. Bioinformatics 27(15):2076–2082
Yang Y et al (2016) Sixty-five years of the long march in protein secondary structure prediction: the final stretch? Brief Bioinform 19(3):482–494
Yuan Z (2005) Better prediction of protein contact number using a support vector regression analysis of amino acid sequence. BMC Bioinformatics 6:248
Zemla A (2003) LGA: a method for finding 3D similarities in protein structures. Nucleic Acids Res 31(13):3370–3374
Zemla A, Venclovas Č, Fidelis K, Rost B (1999) A modified definition of Sov, a segment-based measure for protein secondary structure prediction assessment. Proteins Struct Funct Bioinforma 34(2):220–223
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Torrisi, M., Pollastri, G. (2019). Protein Structure Annotations. In: Shaik, N., Hakeem, K., Banaganapalli, B., Elango, R. (eds) Essentials of Bioinformatics, Volume I. Springer, Cham. https://doi.org/10.1007/978-3-030-02634-9_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-02634-9_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02633-2
Online ISBN: 978-3-030-02634-9
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)