Divide and Conquer Strategies for Protein Structure Prediction

Di Lena, Pietro; Fariselli, Piero; Margara, Luciano; Vassura, Marco; Casadio, Rita

doi:10.1007/978-1-4419-6800-5_2

Pietro Di Lena²,
Piero Fariselli,
Luciano Margara,
Marco Vassura &
…
Rita Casadio

616 Accesses
3 Citations

Abstract

In this chapter, we discuss some approaches to the problem of protein structure prediction by addressing “simpler” sub-problems. The rationale behind this strategy is to develop methods for predicting some interesting structural characteristics of the protein, which can be useful per se and, at the same time, can be of help in solving the main problem. In particular, we discuss the problem of predicting the protein secondary structure, which is at the moment one of the most successful sub-problems addressed in computational biology. Available secondary structure predictors are very reliable and can be routinely used for annotating new genomes or as input for other more complex prediction tasks, such as remote homology detection and functional assignments. As a second example, we also discuss the problem of predicting residue–residue contacts in proteins. In this case, the task is much more complex than secondary structure prediction, and no satisfactory results have been achieved so far. Differently from the secondary structure sub-problem, the residue–residue contact sub-problem is not intrinsically simpler than the prediction of the protein structure, since a roughly correctly predicted set of residue–residue contacts would directly lead to prediction of a protein backbone very close to the real structure. These two protein structure sub-problems are discussed in the light of the current evaluation of the performance that are based on periodical blind-checks (CASP meetings) and permanent evaluation (EVA servers).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://predictioncenter.org/
2.
http://cubic.bioc.columbia.edu/eva/
3.
BAliBASE3 database: http://www-bio3d-igbmc.u-strasbg.fr/balibase/
4.
Jalview software: http://www.jalview.org/
5.
http://cubic.bioc.columbia.edu/eva/doc/intro_sec.html
6.
http://www.predictprotein.org/
7.
http://bioinf.cs.ucl.ac.uk/psipred/
8.
http://cubic.bioc.columbia.edu/eva/doc/intro_con.html
9.
http://gpcr.biocomp.unibo.it/cgi/predictors/cornet/pred_cmapcgi.cgi
10.
http://cubic.bioc.columbia.edu/services/profcon/
11.
http://compbio.soe.ucsc.edu/SAM_T06/T06-query.html

References

Aloy, P., Stark, A., Hadley, C., Russell, R.B.: Predictions without templates: new folds, secondary structure, and contacts in CASP5. Proteins 53, 436–456 (2003)
Article PubMed CAS Google Scholar
Altschul, S.F., Madden, T.L., Schffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)
Article PubMed CAS Google Scholar
Bartoli, L., Capriotti, E., Fariselli, P., Martelli, P.L., Casadio, R.: The pros and cons of predicting protein contact maps. Methods Mol Biol. 413, 199–217 (2008)
PubMed CAS Google Scholar
Benner, S.A., Gerloff, D.: Patterns of divergence in homologous proteins as indicators of secondary and tertiary structure: a prediction of the structure of the catalytic domain of protein kinases. Adv. Enzyme Regul. 31, 121–181 (1991)
Article PubMed CAS Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Heidelberg (2007)
Google Scholar
Chou, P.Y., Fasman, G.D.: Conformational parameters for amino acids in helical, beta-sheet, and random coil regions calculated from proteins. Biochemistry 13, 211–222 (1974)
Article PubMed CAS Google Scholar
Cozzetto, D., Tramontano, A.: Advances and pitfalls in protein structure prediction. Curr Protein Pept Sci. 9, 567–577 (2008)
Article PubMed CAS Google Scholar
Dayhoff, M.O.: Atlas of Protein Sequence and Structure. National Biomedical Research Foundation, Washington DC (1978)
Google Scholar
Di Lena, P., Fariselli, P., Margara, L., Vassura, M., Casadio, R.: On the Upper Bound of the Prediction Accuracy of Residue Contacts in Proteins with Correlated Mutations: The Case Study of the Similarity Matrices. Lecture Notes in Computer Science 5488, 210–221 (2009)
Article Google Scholar
Edgar, R.C.: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004)
Article PubMed CAS Google Scholar
Ezkurdia, I., Graña, O., Izarzugaza, J.M., Tress, M.L.: Assessment of domain boundary predictions and the prediction of intramolecular contacts in CASP8. Proteins 77, 196–209 (2009)
Article PubMed CAS Google Scholar
Fariselli, P., Olmea, O., Valencia, A., Casadio, R.: Prediction of contact maps with neural networks and correlated mutations. Protein Eng. 14, 835–843 (2001)
Article PubMed CAS Google Scholar
Fariselli, P., Olmea, O., Valencia, A., Casadio, R.: Progress in predicting inter-residue contacts of proteins with neural networks and correlated mutations. Proteins 5, 157–162 (2001)
Article PubMed Google Scholar
Fodor, A.A., Aldrich, R.W.: Influence of conservation on calculations of amino acid covariance in multiple sequence alignments. Proteins 56, 211–221 (2004)
Article PubMed CAS Google Scholar
Garcia-Boronat, M., Diez-Rivero, C.M., Reinherz, E.L., Reche, P.A.: PVS: a web server for protein sequence variability analysis tuned to facilitate conserved epitope discovery. Nucleic Acids Res. 36, 35–41 (2008)
Article Google Scholar
Garnier, J., Osguthorpe, D.J., Robson, B.: Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. J. Mol. Biol. 120, 97–120 (1978)
Article PubMed CAS Google Scholar
Göbel, U., Sander, C., Schneider, R., Valencia, A.: Correlated mutations and residue contacts in proteins. Proteins 18, 309–317 (1994)
Article PubMed Google Scholar
Graña, O., Baker, D., MacCallum, R.M., Meiler, J., Punta, M., Rost, B., Tress, M.L., Valencia, A.: CASP6 assessment of contact prediction. Proteins 61, 214–224 (2005)
Article PubMed Google Scholar
Horner, D.S., Pirovano, W., Pesole, G.: Correlated substitution analysis and the prediction of amino acid structural contacts. Brief. Bioinform. 9, 46–56 (2008)
Article PubMed CAS Google Scholar
Izarzugaza, J.M., Graña, O., Tress, M.L., Valencia, A., Clarke, N.D.: Assessment of intramolecular contact predictions for CASP7. Proteins 69, 152–158 (2007)
Article PubMed CAS Google Scholar
Jones, D.T.: Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 292, 195–202 (1999)
Article PubMed CAS Google Scholar
Jones, D.T., Taylor, W.R., Thornton, J.M.: A model recognition approach to the pre-diction of all-helical membrane protein structure and topology. Biochemistry 33, 3038–3049 (1994)
Article PubMed CAS Google Scholar
Kabsch, W., Sander, C.: Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22, 2577–2637 (1983)
Article PubMed CAS Google Scholar
Karplus, K., Barrett, C., Hughey, R.: Hidden Markov models for detecting remote protein homologies. Bioinformatics 14, 846–856 (1998)
Article PubMed CAS Google Scholar
Karplus, K., Katzman, S., Shackleford, G., Koeva, M., Draper, J., Barnes, B., Soriano, M., Hughey, R.: SAM-T04: what is new in protein-structure prediction for CASP6. Proteins 61, 135–142 (2005)
Article PubMed CAS Google Scholar
Lesk, A.: Introduction to Bioinformatics. Oxford University Press, London (2006)
Google Scholar
Lin, K., Simossis, V.A., Taylor, W.R., Heringa, J.: A simple and fast secondary structure prediction method using hidden neural networks. Bioinformatics 21, 152–159 (2005)
Article PubMed CAS Google Scholar
McLachlan, A.D.: Tests for comparing related amino-acid sequences. Cytochrome c and cytochrome c 551. J. Mol. Biol. 61, 409–424 (1971)
Google Scholar
Notredame, C., Higgins, D.G., Heringa, J.: T-Coffee: A novel method for fast and accurate multiple sequence alignment. J Mol Biol. 302, 205–217 (2000)
Article PubMed CAS Google Scholar
Ouali, M., King, R.D.: Cascaded multiple classifiers for secondary structure pre-diction. Protein Sci. 9, 1162–1176 (2000)
Article PubMed CAS Google Scholar
Pauling, L., Corey, R.B.: Configurations of polypeptide chains with favored orientations around single bonds: two new pleated sheets. Proc. Natl. Acad. Sci. USA 37, 729–740 (1951)
Article PubMed CAS Google Scholar
Pauling, L., Corey, R.B., Branson, H.R.: The structure of proteins: two hydrogen-bonded helical configurations of the polypeptide chain. Proc. Natl. Acad. Sci. USA 37, 205–211 (1951)
Article PubMed CAS Google Scholar
Pollastri, G., McLysaght, A.: Porter: a new, accurate server for protein secondary structure prediction. Bioinformatics 21, 1719–1720 (2005)
Article PubMed CAS Google Scholar
Pollastri, G., Przybylski. D., Rost, B., Baldi, P.: Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles. Proteins 47, 228–235 (2002)
Google Scholar
Pollock, D.D., Taylor, W.R.: Effectiveness of correlation analysis in identifying protein residues undergoing correlated evolution. Protein 10, 647–657 (1997)
Article CAS Google Scholar
Porollo, A., Adamczak, R., Wagner, M., Meller, J.: Maximum Feasibility Approach for Consensus Classifiers: Applications to Protein Structure Prediction. In proceedings of CIRAS 2003
Google Scholar
Przybylski, D., Rost, B.: Alignments grow, secondary structure prediction improves. Proteins 46, 197–205 (2002)
Article PubMed CAS Google Scholar
Punta, M., Rost, B.: PROFcon: novel prediction of long-range contacts. Bioinformatics 21, 2960–2968 (2005)
Article PubMed CAS Google Scholar
Raghava, G.P.S.: APSSP2: A combination method for protein secondary structure prediction based on neural network and example based learning. CASP5 A-132 (2002)
Google Scholar
Rost, B.: http://cubic.bioc.columbia.edu/predictprotein
Rost, B.: Rising accuracy of protein secondary structure prediction. In: Chasman D (ed.) Protein structure determination, analysis, and modeling for drug discovery, pp. 207–249. Dekker, New York (2003)
Chapter Google Scholar
Rost, B., Sander, C.: Prediction of protein secondary structure at better than 70% accuracy. J. Mol. Biol. 232, 584–599 (1993)
Article PubMed CAS Google Scholar
Rost, B., Sander, C.: Third generation prediction of secondary structures. Methods Mol. Biol. 143, 71–95 (2000)
PubMed CAS Google Scholar
Sander, C., Schneider, R.: Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins 9, 56–68 (1991)
Article PubMed CAS Google Scholar
Shackelford, G., Karplus, K.: Contact prediction using mutual information and neural nets. Proteins 69,159–164 (2007)
Article PubMed CAS Google Scholar
Sjölander, K., Karplus, K., Brown, M., Hughey, R., Krogh, A., Mian, I.S., Haussler, D.: Dirichlet mixtures: a method for improved detection of weak but significant protein sequence homology. Comput. Appl. Biosci. 12, 327–345 (1996)
PubMed Google Scholar
Thompson, J.D., Higgins, D.G., Gibson, T.J.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680 (1994)
Article PubMed CAS Google Scholar
Ward, J.J., McGuffin, L.J., Buxton, B.F., Jones, D.T.: Secondary structure prediction with support vector machines. Bioinformatics 19, 1650–1655 (2003)
Article PubMed CAS Google Scholar
Wootton, J.C., Federhen, S.: Statistics of local complexity in amino acid sequences and sequence databases. Comput. Chem. 17,149–163 (1996)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Bologna, Bologna, Italy
Pietro Di Lena

Authors

Pietro Di Lena
View author publications
You can also search for this author in PubMed Google Scholar
Piero Fariselli
View author publications
You can also search for this author in PubMed Google Scholar
Luciano Margara
View author publications
You can also search for this author in PubMed Google Scholar
Marco Vassura
View author publications
You can also search for this author in PubMed Google Scholar
Rita Casadio
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pietro Di Lena .

Editor information

Editors and Affiliations

Dipto. Informatica e Sistemistica, Università Roma, La Sapienza, Via Ariosto 25, Roma, 00185, Italy
Renato Bruni

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Di Lena, P., Fariselli, P., Margara, L., Vassura, M., Casadio, R. (2011). Divide and Conquer Strategies for Protein Structure Prediction. In: Bruni, R. (eds) Mathematical Approaches to Polymer Sequence Analysis and Related Problems. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-6800-5_2

Download citation

DOI: https://doi.org/10.1007/978-1-4419-6800-5_2
Published: 21 September 2010
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-6799-2
Online ISBN: 978-1-4419-6800-5
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics